BC OpenAI Versao 01
BC OpenAI Versao 01
Changelog
Keep track of changes to the OpenAI API. You can also track changes via our public OpenAPI
specification which is used to generate our SDKs, documentation, and more. This changelog is
maintained in a best effort fashion and may not reflect all changes being made.
API ? Operational
ChatGPT ? Operational
Labs ? Operational
Playground ? Operational
Past Incidents
Feb 17, 2024
No incidents reported today.
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 14, 10:38 PST
Elevated error rate impacting API services (including Assistants and Fine-
tuning)
Resolved - This incident has been resolved.
Feb 14, 12:40 PST
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 14, 10:38 PST
Update - The team is monitoring error rates closely, and we are continuing to work on a fix for
this issue.
Feb 13, 08:30 PST
Identified - The issue has been identified and a fix is being implemented.
Feb 13, 07:36 PST
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 14, 08:09 PST
Monitoring - A fix has been implemented, and we are monitoring the results.
Feb 13, 13:46 PST
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 12, 10:19 PST
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 10, 06:55 PST
Feb 9, 2024
Identified - The issue has been identified and a fix is being implemented.
Feb 9, 15:50 PST
Monitoring - A fix has been implemented and we are monitoring the results.
Feb 9, 12:14 PST
Feb 8, 2024
No incidents reported.
Feb 7, 2024
No incidents reported.
Feb 6, 2024
No incidents reported.
Feb 5, 2024
Monitoring - A fix has been implemented and we are monitoring the update.
Feb 5, 16:47 PST
Feb 3, 2024
No incidents reported.
Prompting
Documentation
Introducing Sora,
our text-to-video model
Plugins / Actions builders
sora
All categories We’re teaching AI to
understand and
TAGS simulate the physical
chatgpt
world in motion, with the 2 11.5k 1d
goal of training models
gpt-4 that help people solve
api problems that require
real-world interaction.
plugin-development
Sora can generate
lost-user videos up to a minute
All tags lo… read more
Welcome to
community.openai.com!
Welcome to the OpenAI
Developer community!
Looking for ChatGPT
support? Head to
https://help.openai.com! 1 14.8k Aug 2023
This community
resource is for all users
of the various OpenAI
developer platforms. We
welcome discussion of
the … read more
Consistent Image
generation for Story
using DALLE 11 452 2h
Video to Script AI
Model VIDEO-TO-
1 36 2h
Scenario
chatgpt, plugin-development
Open AI to automate
the work processes
1 74 2h
the-decoder.com
gpt-4
UnicodeEncodeError
while using 1 35 3h
load_summarize_chain
CustomGPT | Auth
Token Format 2 225 4h
custom-gpt, actions, oauth
Get started
Log in Sign up
Usage policies
Updated
January 10, 2024
We’ve updated our usage policies to be more readable and added service-specific guidance.
We aim for our tools to be used safely and responsibly, while maximizing your control over how you
use them. By using our services, you agree to adhere to our policies.
We have established universal policies applicable to all our services, as well as specific policies for
builders who use ChatGPT or our API to create applications for themselves or others. Violating our
policies could result in action against your account, up to suspension or termination. We also work to
make our models safer and more useful, by training them to refuse harmful instructions and reduce
their tendency to produce harmful content.
We believe that learning from real-world use is a critical component of creating and releasing
increasingly safe AI systems. We cannot predict all beneficial or abusive uses of our technology, so
we proactively monitor for new abuse trends. Our policies will evolve based on what we learn over
time.
Universal Policies
To maximize innovation and creativity, we believe you should have the flexibility to use our services
as you see fit, so long as you comply with the law and don’t harm yourself or others. When using any
OpenAI service, like ChatGPT, labs.openai.com, and the OpenAI API, these rules apply:
1. Comply with applicable laws – for example, don’t compromise the privacy of others, engage in
regulated activity without complying with applicable regulations, or promote or engage in any
illegal activity, including the exploitation or harm of children and the development or distribution
of illegal substances, goods, or services.
2. Don’t use our service to harm yourself or others – for example, don’t use our services to promote
suicide or self-harm, develop or use weapons, injure others or destroy property, or engage in
unauthorized activities that violate the security of any service or system.
3. Don’t repurpose or distribute output from our services to harm others – for example, don’t share
output from our services to defraud, scam, spam, mislead, bully, harass, defame, discriminate
based on protected attributes, sexualize children, or promote violence, hatred or the suffering of
others.
4. Respect our safeguards - don’t circumvent safeguards or safety mitigations in our services unless
supported by OpenAI (e.g., domain experts in our Red Teaming Network) or related to research
conducted in accordance with our Sharing & Publication Policy.
We report apparent child sexual abuse material (CSAM) to the National Center for Missing and
Exploited Children.
We recognize that our API introduces new capabilities with scalable impact, so we have service-
specific policies that apply to all use of our APIs in addition to our Universal Policies:
We use a combination of automated systems, human review, and user reports to find and assess
GPTs that potentially violate our policies. Violations can lead to actions against the content or your
account, such as warnings, sharing restrictions, or ineligibility for inclusion in GPT Store or
monetization.
GPT Store
We want to make sure that GPTs in the GPT Store are appropriate for all users. For example, GPTs
that contain profanity in their names or that depict or promote graphic violence are not allowed in
our Store. We also don’t allow GPTs dedicated to fostering romantic companionship or performing
regulated activities.
These policies may be enforced automatically at submission time or applied retroactively upon
further review.
Changelog
2024-01-10: We've updated our Usage Policies to be clearer and provide more service-specific
guidance.
2023-02-15: We’ve combined our use case and content policies into a single set of usage policies,
and have provided more specific guidance on what activity we disallow in industries we’ve
considered high risk.
2022-11-09: We no longer require you to register your applications with OpenAI. Instead, we'll be
using a combination of automated and manual methods to monitor for policy violations.
2022-10-25: Updated App Review process (devs no longer need to wait for approval after
submitting as long as they comply with our policies). Moved to an outcomes-based approach and
updated Safety Best Practices.
2022-06-07: Refactored into categories of applications and corresponding requirements
2022-03-09: Refactored into “App Review”
2022-01-19: Simplified copywriting and article writing/editing guidelines
2021-11-15: Addition of “Content guidelines” section; changes to bullets on almost always approved
uses and disallowed uses; renaming document from “Use case guidelines” to “Usage guidelines”.
2021-08-04: Updated with information related to code generation
2021-03-12: Added detailed case-by-case requirements; small copy and ordering edits
2021-02-26: Clarified the impermissibility of Tweet and Instagram generators
Research API
Overview Overview
Index Pricing
GPT-4 Docs
DALL·E 3
Sora
ChatGPT Company
Overview About
Team Blog
Enterprise Careers
Pricing Charter
Try ChatGPT Security
Customer stories
Safety
OpenAI © 2015 – 2024 Social
Terms & policies Twitter
Privacy policy YouTube
Brand guidelines GitHub
SoundCloud
LinkedIn
Back to top
Verificando se a conexão do site é segura
Verificando se a conexão do site é segura
Verificando se a conexão do site é segura
Menu
Updated
January 10, 2024
Trust and privacy are at the core of our mission at OpenAI. We’re committed to privacy and security
for ChatGPT Team, ChatGPT Enterprise, and our API Platform.
Our commitments
Can OpenAI support my compliance with GDPR and other privacy laws?
How does OpenAI handle data retention and monitoring for API usage?
Who can view stored API inputs, outputs, and fine-tuning data?
Research API
Overview Overview
Index Pricing
GPT-4 Docs
DALL·E 3
Sora
ChatGPT Company
Overview About
Team Blog
Enterprise Careers
Pricing Charter
Try ChatGPT Security
Customer stories
Safety
Back to top
Verificando se a conexão do site é segura
Verificando se a conexão do site é segura
Verificando se a conexão do site é segura
Verificando se a conexão do site é segura
Verificando se a conexão do site é segura
Verificando...
Privacidade • Termos
Verificando se a conexão do site é segura
Verificando...
Privacidade • Termos
Verificando...
Privacidade • Termos
Confirme que é humano
Privacidade • Termos
Verificando se a conexão do site é segura
Verificando...
Privacidade • Termos
Verificando...
Privacidade • Termos
Verificando se a conexão do site é segura
Verificando se a conexão do site é segura
Verificando se a conexão do site é segura
Verificando se a conexão do site é segura
Verificando se a conexão do site é segura
Cookbook About API Docs Contribute
This notebook guides you through using Neon Serverless Postgres as a vector database for
OpenAI embeddings. It demonstrates how to:
4. Use Neon with the pgvector extension to perform vector similarity search.
Prerequisites
1. A Neon Postgres database. You can create an account and set up a project with a ready-to-
use neondb database in a few simple steps. For instructions, see Sign up and Create your
first project.
2. A connection string for your Neon database. You can copy it from the Connection Details
widget on the Neon Dashboard. See Connect from any application.
3. The pgvector extension. Install the extension in Neon by running CREATE EXTENSION
vector; . For instructions, see Enable the pgvector extension.
An OpenAI API key is required to generate vectors for documents and queries.
Add the OpenAI API key as an operating system environment variable or provide it for the
session when prompted. If you define an environment variable, name the variable
OPENAI_API_KEY .
For information about configuring your OpenAI API key as an environment variable, refer to
Best Practices for API Key Safety.
# Test to ensure that your OpenAI API key is defined as an environment variable or provide it when pr
# If you run this notebook locally, you may have to reload the terminal and the notebook to make the
import os
from getpass import getpass
Provide your Neon database connection string below or define it in an .env file using a
DATABASE_URL variable. For information about obtaining a Neon connection string, see Connect
import os
import psycopg2
from dotenv import load_dotenv
credits.
import wget
embeddings_url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
'vector_database_wikipedia_articles_embedded.zip'
import zipfile
import os
import re
import tempfile
current_directory = os.getcwd()
zip_file_path = os.path.join(current_directory, "vector_database_wikipedia_articles_embedded.zip")
output_directory = os.path.join(current_directory, "../../data")
if os.path.exists(file_path):
print(f"The csv file {file_name} exists in the data directory.")
else:
print(f"The csv file {file_name} does not exist in the data directory.")
The vector table created in your database is called articles. Each object has title and content
vectors.
create_table_sql = '''
CREATE TABLE IF NOT EXISTS public.articles (
id INTEGER NOT NULL,
url TEXT,
title TEXT,
content TEXT,
title_vector vector(1536),
content_vector vector(1536),
vector_id INTEGER
);
Load the pre-computed vector data into your articles table from the .csv file. There are
25000 records, so expect the operation to take several minutes.
import io
Check the number of records to ensure the data has been been loaded. There should be 25000
records.
Count:25000
After the data is stored in your Neon database, you can query the data for nearest neighbors.
Start by defining the query_neon function, which is executed when you run the vector similarity
search. The function creates an embedding based on the user's query, prepares the SQL query,
and runs the SQL query with the embedding. The pre-computed embeddings that you loaded
into your database were created with text-embedding-3-small OpenAI model, so you must use
the same model to create an embedding for the similarity search.
A vector_name parameter is provided that allows you to search based on "title" or "content".
return results
In this notebook we share examples of how to implement guardrails for your LLM applications.
A guardrail is a generic term for detective controls that aim to steer your application. Greater
steerability is a common requirement given the inherent randomness of LLMs, and so creating
effective guardrails has become one of the most common areas of performance optimization
when pushing an LLM from prototype to production.
Guardrails are incredibly diverse and can be deployed to virtually any context you can imagine
something going wrong with LLMs. This notebook aims to give simple examples that can be
extended to meet your unique use case, as well as outlining the trade-offs to consider when
deciding whether to implement a guardrail, and how to do it.
1. Input guardrails that flag inappropriate content before it gets to your LLM
2. Output guardrails that validate what your LLM has produced before it gets to the customer
Note: This notebook tackles guardrails as a generic term for detective controls around an LLM -
for the official libraries that provide distributions of pre-built guardrails frameworks, please
check out the following:
NeMo Guardrails
Guardrails AI
import openai
GPT_MODEL = 'gpt-3.5-turbo'
1. Input guardrails
Input guardrails aim to prevent inappropriate content getting to the LLM in the first place -
some common use cases are:
Topical guardrails: Identify when a user asks an off-topic question and give them advice on
what topics the LLM can help them with.
Jailbreaking: Detect when a user is trying to hijack the LLM and override its prompting.
Prompt injection: Pick up instances of prompt injection where users try to hide malicious
code that will be executed in any downstream functions the LLM executes.
In all of these they act as a preventative control, running either before or in parallel with the
LLM, and triggering your application to behave differently if one of these criteria are met.
Designing a guardrail
When designing guardrails it is important to consider the trade-off between accuracy, latency
and cost, where you try to achieve maximum accuracy for the least impact to your bottom line
and the user's experience.
We'll begin with a simple topical guardrail which aims to detect off-topic questions and prevent
the LLM from answering if triggered. This guardrail consists of a simple prompt and uses gpt-
3.5-turbo , maximising latency/cost over accuracy, but if we wanted to optimize further we
could consider:
Accuracy: You could consider using a fine-tuned model or few-shot examples to increase
the accuracy. RAG can also be effective if you have a corpus of information that can help
determine whether a piece of content is allowed or not.
Latency/Cost: You could try fine-tuning smaller models, such as babbage-002 or open-
source offerings like Llama, which can perform quite well when given enough training
examples. When using open-source offerings you can also tune the machines you are using
for inference to maximize either cost or latency reduction.
This simple guardrail aims to ensure the LLM only answers to a predefined set of topics, and
responds to out-of-bounds queries with a canned message.
Embrace async
A common design to minimize latency is to send your guardrails asynchronously along with
your main LLM call. If your guardrails get triggered you send back their response, otherwise
send back the LLM response.
We'll use this approach, creating an execute_chat_with_guardrails function that will run our
LLM's get_chat_response and the topical_guardrail guardrail in parallel, and return the LLM
response only if the guardrail returns allowed .
Limitations
You should always consider the limitations of guardrails when developing your design. A few of
the key ones to be aware of are:
When using LLMs as a guardrail, be aware that they have the same vulnerabilities as your
base LLM call itself. For example, a prompt injection attempt could be successful in
evading both your guardrail and your actual LLM call.
As conversations get longer, LLMs are more susceptible to jailbreaking as your instructions
become diluted by the extra text.
Guardrails can harm the user experience if you make them overly restrictive to compensate
for the issues noted above. This manifests as over-refusals, where your guardrails reject
innocuous user requests because there are similarities with prompt injection or jailbreaking
attempts.
Mitigations
If you can combine guardrails with rules-based or more traditional machine learning models for
detection this can mitigate some of these risks. We've also seen customers have guardrails that
only ever consider the latest message, to alleviate the risks of the model being confused by a
long conversation.
We would also recommend doing a gradual roll-out with active monitoring of conversations so
you can pick up instances of prompt injection or jailbreaking, and either add more guardrails to
cover these new types of behaviour, or include them as training examples to your existing
guardrails.
system_prompt = "You are a helpful assistant."
import asyncio
return response.choices[0].message.content
while True:
done, _ = await asyncio.wait(
[topical_guardrail_task, chat_task], return_when=asyncio.FIRST_COMPLETED
)
if topical_guardrail_task in done:
guardrail_response = topical_guardrail_task.result()
if guardrail_response == "not_allowed":
chat_task.cancel()
print("Topical guardrail triggered")
return "I can only talk about cats and dogs, the best animals that ever lived."
elif chat_task in done:
chat_response = chat_task.result()
return chat_response
else:
await asyncio.sleep(0.1) # sleep for a bit before checking the tasks again
# Call the main function with the good request - this should go through
response = await execute_chat_with_guardrail(good_request)
print(response)
1. Basenji: Known as the "barkless dog," Basenjis are independent, clean, and have a cat-like g
2. Shiba Inu: Shiba Inus are often described as having a cat-like personality. They are indepen
3. Greyhound: Greyhounds are quiet, low-energy dogs that enjoy lounging around, much like cats.
4. Bichon Frise: Bichon Frises are small, friendly dogs that are often compared to cats due to
5. Cavalier King Charles Spaniel: These dogs are affectionate, gentle, and adaptable, making th
Remember, individual dogs can have different personalities, so it's important to spend time wit
# Call the main function with the good request - this should get blocked
response = await execute_chat_with_guardrail(bad_request)
print(response)
Looks like our guardrail worked - the first question was allowed through, but the second was
blocked for being off-topic. Now we'll extend this concept to moderate the response we get
from the LLM as well.
2. Output guardrails
Output guardrails govern what the LLM comes back with. These can take many forms, with
some of the most common being:
Moderation guardrails: Applying brand and corporate guidelines to moderate the LLM's
results, and either blocking or rewriting its response if it breaches them.
Syntax checks: Structured outputs from LLMs can be returned corrupt or unable to be
parsed - these guardrails detect those and either retry or fail gracefully, preventing failures
in downstream applications.
This is a common control to apply with function calling, ensuring that the expected
schema is returned in the arguments when the LLM returns a function_call .
Moderation guardrail
Here we implement a moderation guardrail that uses a version of the G-Eval evaluation
method to score the presence of unwanted content in the LLM's response. This method is
demonstrated in more detail in of our other notebooks.
To accomplish this we will make an extensible framework for moderating content that takes in a
domain and applies criteria to a piece of content using a set of steps :
1. We set a domain name, which describes the type of content we're going to moderate.
2. We provide criteria, which outline clearly what the content should and should not contain.
3. Step-by-step instructions are provided for the LLM to grade the content.
Our output guardrail will assess the LLM's response and block anything scoring a 3 or higher.
Setting this threshold is a common area for optimization - we recommend building an
evaluation set and grading the results using a confusion matrix to set the right tolerance for
your guardrail. The trade-off here is generally:
More false positives leads to a fractured user experience, where customers get annoyed
and the assistant seems less helpful.
More false negatives can cause lasting harm to your business, as people get the assistant to
answer inappropriate questions, or prompt inject/jailbreak it.
For example, for jailbreaking you may want to have a very low threshold, as the risk to your
business if your LLM is hijacked and used to produce dangerous content that ends up on social
media is very high. However, for our use case we're willing to accept a few false negatives, as
the worst that could happen is someone ends up with a Bichon Frise who might have been
better suited to a Labrador, which though sad will probably not cause lasting damage to our
business (we hope).
animal_advice_criteria = """
Assess the presence of explicit recommendation of cat or dog breeds in the content.
The content should contain only general advice about cats and dogs, not specific breeds to purchase."
animal_advice_steps = """
1. Read the content and the criteria carefully.
2. Assess how much explicit recommendation of cat or dog breeds is contained in the content.
3. Assign an animal advice score from 1 to 5, with 1 being no explicit cat or dog breed advice, and 5
"""
moderation_system_prompt = """
You are a moderation assistant. Your role is to detect content about {domain} in the text provided, a
## {domain}
### Criteria
{scoring_criteria}
### Instructions
{scoring_steps}
### Content
{content}
while True:
done, _ = await asyncio.wait(
[topical_guardrail_task, chat_task], return_when=asyncio.FIRST_COMPLETED
)
if topical_guardrail_task in done:
guardrail_response = topical_guardrail_task.result()
if guardrail_response == "not_allowed":
chat_task.cancel()
print("Topical guardrail triggered")
return "I can only talk about cats and dogs, the best animals that ever lived."
elif chat_task in done:
chat_response = chat_task.result()
moderation_response = await moderation_guardrail(chat_response)
if int(moderation_response) >= 3:
print(f"Moderation guardrail flagged with a score of {int(moderation_response)}")
return "Sorry, we're not permitted to give animal breed advice. I can help you wi
else:
print('Passed moderation')
return chat_response
else:
await asyncio.sleep(0.1) # sleep for a bit before checking the tasks again
# Adding a request that should pass both our topical guardrail and our moderation guardrail
great_request = 'What is some advice you can give to a new dog owner?'
tests = [good_request,bad_request,great_request]
Conclusion
Guardrails are a vibrant and evolving topic in LLMs, and we hope this notebook has given you
an effective introduction to the core concepts around guardrails. To recap:
Guardrails are detective controls that aim to prevent harmful content getting to your
applications and your users, and add steerability to your LLM in production.
They can take the form of input guardrails, which target content before it gets to the LLM,
and output guardrails, which control the LLM's response.
Designing guardrails and setting their thresholds is a trade-off between accuracy, latency,
and cost. Your decision should be based on clear evaluations of the performance of your
guardrails, and an understanding of what the cost of a false negative and false positive are
for your business.
We look forward to seeing how you take this forward, and how thinking on guardrails evolves as
the ecosystem matures.
Cookbook About API Docs Contribute
Note: you will need an OpenAI API key to run this colab.
If you use OpenAI's API to fine-tune ChatGPT-3.5, you can now use the W&B integration to
track experiments, models, and datasets in your central dashboard.
See the OpenAI section in the Weights & Biases documentation for full details of the
integration
It's always more fun to experiment with your own projects so if you have already used the
openai API to fine-tune an OpenAI model, just skip this section.
import os
import json
import random
import tiktoken
import numpy as np
import pandas as pd
from pathlib import Path
from tqdm.auto import tqdm
from collections import defaultdict
from tenacity import retry, stop_after_attempt, wait_fixed
Start your Weigths & Biases run. If you don't have an account you can sign up for one for free at
www.wandb.ai
WANDB_PROJECT = "OpenAI-Fine-Tune"
# # Enter credentials
openai_key = "YOUR_API_KEY"
openai.api_key = openai_key
Dataset Preparation
We download a dataset from LegalBench, a project to curate tasks for evaluating legal
reasoning, specifically the Contract NLI Explicit Identification task.
This comprises of a total of 117 examples, from which we will create our own train and test
datasets
data = []
for d in dataset["train"]:
data.append(d)
for d in dataset["test"]:
data.append(d)
random.shuffle(data)
len(data), data[0:2]
(117,
[{'answer': 'No',
'index': '94',
'text': 'Recipient shall use the Confidential Information exclusively for HySafe purposes, e
'document_name': 'NDA_V3.pdf',
'new_index': 0},
{'answer': 'No',
'index': '53',
'text': '3. In consideration of each and every disclosure of CONFIDENTIAL INFORMATION, the P
'document_name': '1084000_0001144204-06-046785_v056501_ex10-16.txt',
'new_index': 1}])
We modify the base_prompt from the LegalBench task to make it a zero-shot prompt, as we are
training the model instead of using few-shot prompting
base_prompt_zero_shot = "Identify if the clause provides that all Confidential Information shall be e
We now split it into training/validation dataset, lets train on 30 samples and test on the
remainder
n_train = 30
n_test = len(data) - n_train
train_messages = []
test_messages = []
for d in data:
prompts = []
prompts.append({"role": "system", "content": base_prompt_zero_shot})
prompts.append({"role": "user", "content": d["text"]})
prompts.append({"role": "assistant", "content": d["answer"]})
(30,
87,
87,
{'messages': [{'role': 'system',
'content': 'Identify if the clause provides that all Confidential Information shall be expr
{'role': 'user',
'content': '2. The Contractor shall not, without the State’s prior written consent, copy, d
{'role': 'assistant', 'content': 'No'}]})
train_file_path = 'encoded_train_data.jsonl'
with open(train_file_path, 'w') as file:
for item in train_messages:
line = json.dumps(item)
file.write(line + '\n')
test_file_path = 'encoded_test_data.jsonl'
with open(test_file_path, 'w') as file:
for item in test_messages:
line = json.dumps(item)
file.write(line + '\n')
Next, we validate that our training data is in the correct format using a script from the OpenAI
fine-tuning documentation
# Next, we specify the data path and open the JSONL file
def openai_validate_data(dataset_path):
data_path = dataset_path
# Load dataset
with open(data_path) as f:
dataset = [json.loads(line) for line in f]
# We can inspect the data quickly by checking the number of examples and the first item
# Initial dataset stats
print("Num examples:", len(dataset))
print("First example:")
for message in dataset[0]["messages"]:
print(message)
# Now that we have a sense of the data, we need to go through all the different examples and check
for ex in dataset:
if not isinstance(ex, dict):
format_errors["data_type"] += 1
continue
if format_errors:
print("Found errors:")
for k, v in format_errors.items():
print(f"{k}: {v}")
else:
print("No errors found")
# Beyond the structure of the message, we also need to ensure that the length does not exceed the 4
# not exact!
# simplified from https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_
def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
num_tokens = 0
for message in messages:
num_tokens += tokens_per_message
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name":
num_tokens += tokens_per_name
num_tokens += 3
return num_tokens
def num_assistant_tokens_from_messages(messages):
num_tokens = 0
for message in messages:
if message["role"] == "assistant":
num_tokens += len(encoding.encode(message["content"]))
return num_tokens
# Last, we can look at the results of the different formatting operations before proceeding with cr
for ex in dataset:
messages = ex["messages"]
if not any(message["role"] == "system" for message in messages):
n_missing_system += 1
if not any(message["role"] == "user" for message in messages):
n_missing_user += 1
n_messages.append(len(messages))
convo_lens.append(num_tokens_from_messages(messages))
assistant_message_lens.append(num_assistant_tokens_from_messages(messages))
MIN_TARGET_EXAMPLES = 100
MAX_TARGET_EXAMPLES = 25000
TARGET_EPOCHS = 3
MIN_EPOCHS = 1
MAX_EPOCHS = 25
n_epochs = TARGET_EPOCHS
n_train_examples = len(dataset)
if n_train_examples * TARGET_EPOCHS < MIN_TARGET_EXAMPLES:
n_epochs = min(MAX_EPOCHS, MIN_TARGET_EXAMPLES // n_train_examples)
elif n_train_examples * TARGET_EPOCHS > MAX_TARGET_EXAMPLES:
n_epochs = max(MIN_EPOCHS, MAX_TARGET_EXAMPLES // n_train_examples)
openai_validate_data(train_file_path)
Num examples: 30
First example:
{'role': 'system', 'content': 'Identify if the clause provides that all Confidential Informatio
{'role': 'user', 'content': 'Recipient shall use the Confidential Information exclusively for H
{'role': 'assistant', 'content': 'No'}
No errors found
Num examples missing system message: 0
Num examples missing user message: 0
0 examples may be over the 4096 token limit, they will be truncated during fine-tuning
Dataset has ~4304 tokens that will be charged for during training
By default, you'll train for 3 epochs on this dataset
By default, you'll be charged for ~12912 tokens
See pricing page to estimate total costs
Log our data to Weigths & Biases Artifacts for storage and versioning
wandb.init(
project=WANDB_PROJECT,
# entity="prompt-eng",
job_type="log-data",
config = {'n_train': n_train,
'n_valid': n_test})
wandb.log_artifact(train_file_path,
"legalbench-contract_nli_explicit_identification-train",
type="train-data")
wandb.log_artifact(test_file_path,
"legalbench-contract_nli_explicit_identification-test",
type="test-data")
# keep entity (typically your wandb username) for reference of artifact later in this demo
entity = wandb.run.entity
wandb.finish()
Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAM
[34m[1mwandb[0m: Currently logged in as: [33mcapecape[0m. Use [1m`wandb login --relogin`
Let's first download our training & validation files and save them to a folder called my_data . We
will retrieve the latest version of the artifact, but it could also be v0 , v1 or any alias we
associated with it
wandb.init(project=WANDB_PROJECT,
# entity="prompt-eng",
job_type="finetune")
artifact_train = wandb.use_artifact(
f'{entity}/{WANDB_PROJECT}/legalbench-contract_nli_explicit_identification-train:latest',
type='train-data')
train_file = artifact_train.get_path(train_file_path).download("my_data")
train_file
'my_data/encoded_train_data.jsonl'
Then we upload the training data to OpenAI. OpenAi has to process the data, so this will take a
few minutes depending on the size of your dataset.
openai_train_file_info = openai.File.create(
file=open(train_file, "rb"),
purpose='fine-tune'
)
# you may need to wait a couple of minutes for OpenAI to process the file
openai_train_file_info
model = 'gpt-3.5-turbo'
n_epochs = 3
openai_ft_job_info = openai.FineTuningJob.create(
training_file=openai_train_file_info["id"],
model=model,
hyperparameters={"n_epochs": n_epochs}
)
ft_job_id = openai_ft_job_info["id"]
openai_ft_job_info
“this takes around 5 minutes to train, and you get an email from OpenAI when finished.”
Thats it!
Now your model is training on OpenAI's machines. To get the current state of your fine-tuning
job, run:
state = openai.FineTuningJob.retrieve(ft_job_id)
state["status"], state["trained_tokens"], state["finished_at"], state["fine_tuned_model"]
('succeeded',
12732,
1693389024,
'ft:gpt-3.5-turbo-0613:weights-biases::7tC85HcX')
openai.FineTuningJob.list_events(id=ft_job_id, limit=5)
We can run a few different fine-tunes with different parameters or even with different datasets.
Log OpenAI fine-tune jobs to Weights & Biases
usage: openai wandb sync [-h] [-i ID] [-n N_FINE_TUNES] [--project PROJECT]
[--entity ENTITY] [--force] [--legacy]
options:
-h, --help show this help message and exit
-i ID, --id ID The id of the fine-tune job (optional)
-n N_FINE_TUNES, --n_fine_tunes N_FINE_TUNES
Number of most recent fine-tunes to log when an id is
not provided. By default, every fine-tune is synced.
--project PROJECT Name of the Weights & Biases project where you're
sending runs. By default, it is "OpenAI-Fine-Tune".
--entity ENTITY Weights & Biases username or team name where you're
sending runs. By default, your default entity is used,
which is usually your username.
--force Forces logging and overwrite existing wandb run of the
same fine-tune.
--legacy Log results from legacy OpenAI /v1/fine-tunes api
Calling openai wandb sync will log all un-synced fine-tuned jobs to W&B
See the OpenAI section in the Weights & Biases documentation for full details of the
integration
🎉
[34m[1mwandb[0m: Find logs at: [35m[1m./wandb/run-20230830_115915-ftjob-x4tl83IlSGolkUF3fC
wandb sync completed successfully
wandb.finish()
wandb: WARNING Source type is set to 'repo' but some required information is missing from the e
upload_file exception https://storage.googleapis.com/wandb-production.appspot.com/capecape/Open
upload_file request headers: {'User-Agent': 'python-requests/2.28.2', 'Accept-Encoding': 'gzip,
upload_file response body:
upload_file exception https://storage.googleapis.com/wandb-production.appspot.com/capecape/Open
upload_file request headers: {'User-Agent': 'python-requests/2.28.2', 'Accept-Encoding': 'gzip,
upload_file response body:
image.png
Anytime we have new fine-tunes, we can just call openai wandb sync to add them to our
dashboard.
The best way to evaluate a generative model is to explore sample predictions from your
evaluation set.
Let's generate a few inference samples and log them to W&B and see how the performance
compares to a baseline ChatGPT-3.5 model
wandb.init(project=WANDB_PROJECT,
job_type='eval')
artifact_valid = wandb.use_artifact(
f'{entity}/{WANDB_PROJECT}/legalbench-contract_nli_explicit_identification-test:latest',
type='test-data')
test_file = artifact_valid.get_path(test_file_path).download("my_data")
with open(test_file) as f:
test_dataset = [json.loads(line) for line in f]
@retry(stop=stop_after_attempt(3), wait=wait_fixed(60))
def call_openai(messages="", model="gpt-3.5-turbo"):
return openai.ChatCompletion.create(model=model, messages=messages, max_tokens=10)
state = openai.FineTuningJob.retrieve(ft_job_id)
ft_model_id = state["fine_tuned_model"]
ft_model_id
'ft:gpt-3.5-turbo-0613:weights-biases::7tC85HcX'
eval_data = []
wandb.log({'predictions': prediction_table})
correct = 0
for e in eval_data:
if e[1].lower() == e[2]["content"].lower():
correct+=1
print(f"Accuracy is {accuracy}")
wandb.log({"eval/accuracy": accuracy})
wandb.summary["eval/accuracy"] = accuracy
Accuracy is 0.8390804597701149
wandb.log({'baseline_predictions': baseline_prediction_table})
baseline_correct = 0
for e in baseline_eval_data:
if e[1].lower() == e[2]["content"].lower():
baseline_correct+=1
wandb.finish()
wandb: WARNING Source type is set to 'repo' but some required information is missing from the e
eval/accuracy eval/accuracy
▁ 0.83908
eval/baseline_accuracy eval/baseline_accuracy
▁ 0.7931
And thats it! In this example we have prepared our data, logged it to Weights & Biases, fine-
tuned an OpenAI model using that data, logged the results to Weights & Biases and then run
evaluation on the fine-tuned model.
From here you can start to train on larger or more complex tasks, or else explore other ways to
modify ChatGPT-3.5 such as giving it a different tone and style or response.
Resources
OpenAI Fine-Tuning Guide
Lets begin by first downloading the required libraries for this notebook:
To get Zilliz up and running take a look here. With your account and database set up, proceed
to set the following values:
import openai
URI = 'your_uri'
TOKEN = 'your_token' # TOKEN == user:password or api_key
COLLECTION_NAME = 'book_search'
DIMENSION = 1536
OPENAI_ENGINE = 'text-embedding-3-small'
openai.api_key = 'sk-your-key'
INDEX_PARAM = {
'metric_type':'L2',
'index_type':"AUTOINDEX",
'params':{}
}
QUERY_PARAM = {
"metric_type": "L2",
"params": {},
}
BATCH_SIZE = 1000
Zilliz
This segment deals with Zilliz and setting up the database for this use case. Within Zilliz we
need to setup a collection and index it.
Dataset
With Zilliz up and running we can begin grabbing our data. Hugging Face Datasets is a hub
that holds many different user datasets, and for this example we are using Skelebor's book
dataset. This dataset contains title-description pairs for over 1 million books. We are going to
embed each description and store it within Zilliz along with its title.
import datasets
# Download the dataset and only use the `train` portion (file is around 800Mb)
dataset = datasets.load_dataset('Skelebor/book_titles_and_descriptions_en_clean', split='train')
/Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages/tqdm/auto.py:22: Tqd
from .autonotebook import tqdm as notebook_tqdm
Found cached dataset parquet (/Users/filiphaltmayer/.cache/huggingface/datasets/Skelebor___parq
Now that we have our data on our machine we can begin embedding it and inserting it into
Zilliz. The embedding function takes in text and returns the embeddings in a list format.
This next step does the actual inserting. Due to having so many datapoints, if you want to
immediately test it out you can stop the inserting cell block early and move along. Doing this
will probably decrease the accuracy of the results due to less datapoints, but it should still be
good enough.
data = [
[], # title
[], # description
]
KeyboardInterrupt
With our data safely inserted in Zilliz, we can now perform a query. The query takes in a string
or a list of strings and searches them. The results print out your provided description and the
results that include the result score, the result title, and the result book description.
import textwrap
Rank: 3 Score: 0.33899369835853577 Title: Dogs on Duty: Soldiers' Best Friends on the Battl
When the news of the raid on Osama Bin Laden's compound broke, the SEAL team member that
stole the show was a highly trained canine companion. Throughout history, dogs have been
key contributors to military units. Dorothy Hinshaw Patent follows man's best friend
Cookbook About API Docs Contribute
In this notebook we will be going over generating embeddings of book descriptions with
OpenAI and using those embeddings within Milvus to find relevant books. The dataset in this
example is sourced from HuggingFace datasets, and contains a little over 1 million title-
description pairs.
Lets begin by first downloading the required libraries for this notebook:
With the required packages installed we can get started. Lets begin by launching the Milvus
service. The file being run is the docker-compose.yaml found in the folder of this file. This
command launches a Milvus standalone instance which we will use for this test.
! docker compose up -d
import openai
HOST = 'localhost'
PORT = 19530
COLLECTION_NAME = 'book_search'
DIMENSION = 1536
OPENAI_ENGINE = 'text-embedding-3-small'
openai.api_key = 'sk-your_key'
INDEX_PARAM = {
'metric_type':'L2',
'index_type':"HNSW",
'params':{'M': 8, 'efConstruction': 64}
}
QUERY_PARAM = {
"metric_type": "L2",
"params": {"ef": 64},
}
BATCH_SIZE = 1000
Milvus
This segment deals with Milvus and setting up the database for this use case. Within Milvus we
need to setup a collection and index the collection.
Dataset
With Milvus up and running we can begin grabbing our data. Hugging Face Datasets is a hub
that holds many different user datasets, and for this example we are using Skelebor's book
dataset. This dataset contains title-description pairs for over 1 million books. We are going to
embed each description and store it within Milvus along with its title.
import datasets
# Download the dataset and only use the `train` portion (file is around 800Mb)
dataset = datasets.load_dataset('Skelebor/book_titles_and_descriptions_en_clean', split='train')
/Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages/tqdm/auto.py:22: Tqd
from .autonotebook import tqdm as notebook_tqdm
Found cached dataset parquet (/Users/filiphaltmayer/.cache/huggingface/datasets/Skelebor___parq
Now that we have our data on our machine we can begin embedding it and inserting it into
Milvus. The embedding function takes in text and returns the embeddings in a list format.
# Simple function that converts the texts to embeddings
def embed(texts):
embeddings = openai.Embedding.create(
input=texts,
engine=OPENAI_ENGINE
)
return [x['embedding'] for x in embeddings['data']]
This next step does the actual inserting. Due to having so many datapoints, if you want to
immidiately test it out you can stop the inserting cell block early and move along. Doing this will
probably decrease the accuracy of the results due to less datapoints, but it should still be good
enough.
data = [
[], # title
[], # description
]
KeyboardInterrupt
With our data safely inserted in Milvus, we can now perform a query. The query takes in a string
or a list of strings and searches them. The resuts print out your provided description and the
results that include the result score, the result title, and the result book description.
import textwrap
Regression means predicting a number, rather than one of the categories. We will predict the
score based on the embedding of the review's text. We split the dataset into a training and a
testing set for all of the following tasks, so we can realistically evaluate performance on unseen
data. The dataset is created in the Get_embeddings_from_dataset Notebook.
We're predicting the score of the review, which is a number between 1 and 5 (1-star being
negative and 5-star positive).
import pandas as pd
import numpy as np
from ast import literal_eval
datafile_path = "data/fine_food_reviews_with_embeddings_1k.csv"
df = pd.read_csv(datafile_path)
df["embedding"] = df.embedding.apply(literal_eval).apply(np.array)
rfr = RandomForestRegressor(n_estimators=100)
rfr.fit(X_train, y_train)
preds = rfr.predict(X_test)
We can see that the embeddings are able to predict the scores with an average error of 0.53 per
score prediction. This is roughly equivalent to predicting half of reviews perfectly, and half off by
one star.
You could also train a classifier to predict the label, or use the embeddings within an existing ML
model to encode free text features.
Cookbook About API Docs Contribute
This notebook builds on the concepts in the argument generation notebook, by creating an
agent with access to a knowledge base and two functions that it can call based on the user
requirement.
We'll create an agent that uses data from arXiv to answer questions about academic subjects. It
has two functions at its disposal:
get_articles: A function that gets arXiv articles on a subject and summarizes them for the
user with links.
This will get you comfortable with a multi-function workflow that can choose from multiple
services, and where some of the data from the first function is persisted to be used by the
second.
Walkthrough
Search utilities: Creating the two functions that access arXiv for answers.
Configure Agent: Building up the Agent behaviour that will assess the need for a function
and, if one is required, call that function and present results back to the agent.
import os
import arxiv
import ast
import concurrent
import json
import os
import pandas as pd
import tiktoken
from csv import writer
from IPython.display import display, Markdown, Latex
from openai import OpenAI
from PyPDF2 import PdfReader
from scipy import spatial
from tenacity import retry, wait_random_exponential, stop_after_attempt
from tqdm import tqdm
from termcolor import colored
GPT_MODEL = "gpt-3.5-turbo-0613"
EMBEDDING_MODEL = "text-embedding-ada-002"
client = OpenAI()
Search utilities
We'll first set up some utilities that will underpin our two functions.
Downloaded papers will be stored in a directory (we use ./data/papers here). We create a file
arxiv_library.csv to store the embeddings and details for downloaded papers to retrieve
against using summarize_text .
directory = './data/papers'
# Write to file
with open(library, "a") as f_object:
writer_object = writer(f_object)
writer_object.writerow(file_reference)
f_object.close()
return result_list
{'title': 'Entanglement entropy and deconfined criticality: emergent SO(5) symmetry and proper
'summary': "We study the R\\'enyi entanglement entropy (EE) of the two-dimensional $J$-$Q$\nmo
'article_url': 'http://arxiv.org/abs/2401.14396v1',
'pdf_url': 'http://arxiv.org/pdf/2401.14396v1'}
def strings_ranked_by_relatedness(
query: str,
df: pd.DataFrame,
relatedness_fn=lambda x, y: 1 - spatial.distance.cosine(x, y),
top_n: int = 100,
) -> list[str]:
"""Returns a list of strings and relatednesses, sorted from most related to least."""
query_embedding_response = embedding_request(query)
query_embedding = query_embedding_response.data[0].embedding
strings_and_relatednesses = [
(row["filepath"], relatedness_fn(query_embedding, row["embedding"]))
for i, row in df.iterrows()
]
strings_and_relatednesses.sort(key=lambda x: x[1], reverse=True)
strings, relatednesses = zip(*strings_and_relatednesses)
return strings[:top_n]
def read_pdf(filepath):
"""Takes a filepath to a PDF and returns a string of the PDF's contents"""
# creating a pdf reader object
reader = PdfReader(filepath)
pdf_text = ""
page_number = 0
for page in reader.pages:
page_number += 1
pdf_text += page.extract_text() + f"\nPage Number: {page_number}"
return pdf_text
# Split a text into smaller chunks of size n, preferably ending at the end of a sentence
def create_chunks(text, n, tokenizer):
"""Returns successive n-sized chunks from provided text."""
tokens = tokenizer.encode(text)
i = 0
while i < len(tokens):
# Find the nearest end of sentence within a range of 0.5 * n and 1.5 * n tokens
j = min(i + int(1.5 * n), len(tokens))
while j > i + int(0.5 * n):
# Decode the tokens and check for full stop or newline
chunk = tokenizer.decode(tokens[i:j])
if chunk.endswith(".") or chunk.endswith("\n"):
break
j -= 1
# If no end of sentence found, use n tokens as the chunk size
if j == i + int(0.5 * n):
j = min(i + n, len(tokens))
yield tokens[i:j]
i = j
def summarize_text(query):
"""This function does the following:
- Reads in the arxiv_library.csv file in including the embeddings
- Finds the closest file to the user's query
- Scrapes the text out of the file and chunks it
- Summarizes each chunk in parallel
- Does one final summary and returns this to the user"""
# A prompt to dictate how the recursive summarizations should approach the input paper
summary_prompt = """Summarize this text from an academic paper. Extract any key points with reaso
# If the library is empty (no searches have been performed yet), we perform one and download the
library_df = pd.read_csv(paper_dir_filepath).reset_index()
if len(library_df) == 0:
print("No papers searched yet, downloading first.")
get_articles(query)
print("Papers downloaded, continuing")
library_df = pd.read_csv(paper_dir_filepath).reset_index()
library_df.columns = ["title", "filepath", "embedding"]
library_df["embedding"] = library_df["embedding"].apply(ast.literal_eval)
strings = strings_ranked_by_relatedness(query, library_df, top_n=1)
print("Chunking text from paper")
pdf_text = read_pdf(strings[0])
# Initialise tokenizer
tokenizer = tiktoken.get_encoding("cl100k_base")
results = ""
# Final summary
print("Summarizing into overall summary")
response = client.chat.completions.create(
model=GPT_MODEL,
messages=[
{
"role": "user",
"content": f"""Write a summary collated from this collection of key points extracted
The summary should highlight the core argument, conclusions and evidence, and
User query: {query}
The summary should be structured in bulleted lists following the headings Cor
Key points:\n{results}\nSummary:\n""",
}
],
temperature=0,
)
return response
print(chat_test_response.choices[0].message.content)
The academic paper discusses the unique decomposition of generators of completely positive dyna
Configure Agent
We'll create our agent in this step, including a Conversation class to support multiple turns
with the API, and some Python functions to enable interaction between the ChatCompletion
API and our knowledge base functions.
class Conversation:
def __init__(self):
self.conversation_history = []
if full_message.message.function_call.name == "get_articles":
try:
parsed_output = json.loads(
full_message.message.function_call.arguments
)
print("Getting search results")
results = get_articles(parsed_output["query"])
except Exception as e:
print(parsed_output)
print(f"Function execution failed")
print(f"Error message: {e}")
messages.append(
{
"role": "function",
"name": full_message.message.function_call.name,
"content": str(results),
}
)
try:
print("Got search results, summarizing content")
response = chat_completion_request(messages)
return response
except Exception as e:
print(type(e))
raise Exception("Function chat request failed")
elif (
full_message.message.function_call.name == "read_article_and_summarize"
):
parsed_output = json.loads(
full_message.message.function_call.arguments
)
print("Finding and reading paper")
summary = summarize_text(parsed_output["query"])
return summary
else:
raise Exception("Function does not exist and cannot be called")
arXiv conversation
Let's put this all together by testing our functions out in conversation.
# Start with a system message
paper_system_message = """You are arXivGPT, a helpful assistant pulls academic papers to answer user
You summarize the papers clearly so the customer can decide which to read to answer their question.
You always provide the article_url and title so the user can understand the name of the paper and cli
Begin!"""
paper_conversation = Conversation()
paper_conversation.add_message("system", paper_system_message)
<IPython.core.display.Markdown object>
# Add another user message to induce our system to use the second tool
paper_conversation.add_message(
"user",
"Can you read the PPO sequence generation paper for me and give me a summary",
)
updated_response = chat_completion_with_function_execution(
paper_conversation.conversation_history, functions=arxiv_functions
)
display(Markdown(updated_response.choices[0].message.content))
<IPython.core.display.Markdown object>
Cookbook About API Docs Contribute
Using logprobs
James Hills, Shyamal Anadkat
Open in Github
Dec 19, 2023
This notebook demonstrates the use of the logprobs parameter in the Chat Completions API.
When logprobs is enabled, the API returns the log probabilities of each output token, along
with a limited number of the most likely tokens at each token position and their log
probabilities. The relevant request parameters are:
logprobs : Whether to return log probabilities of the output tokens or not. If true, returns
the log probabilities of each output token returned in the content of message. This option
is currently not available on the gpt-4-vision-preview model.
top_logprobs : An integer between 0 and 5 specifying the number of most likely tokens to
return at each token position, each with an associated log probability. logprobs must be
set to true if this parameter is used.
Log probabilities of output tokens indicate the likelihood of each token occurring in the
sequence given the context. To simplify, a logprob is log(p) , where p = probability of a token
occurring at a specific position based on the previous tokens in the context. Some key points
about logprobs :
Higher log probabilities suggest a higher likelihood of the token in that context. This allows
users to gauge the model's confidence in its output or explore alternative responses the
model considered.
Logprob can be any negative number or 0.0 . 0.0 corresponds to 100% probability.
Logprobs allow us to compute the joint probability of a sequence as the sum of the
logprobs of the individual tokens. This is useful for scoring and ranking model outputs.
Another common approach is to take the average per-token logprob of a sentence to
choose the best generation.
We can examine the logprobs assigned to different candidate tokens to understand what
options the model considered plausible or implausible.
While there are a wide array of use cases for logprobs , this notebook will focus on its use for:
1. Classification tasks
Large Language Models excel at many classification tasks, but accurately measuring the
model's confidence in its outputs can be challenging. logprobs provide a probability
associated with each class prediction, enabling users to set their own classification or
confidence thresholds.
logprobs can assist with self-evaluation in retrieval applications. In the Q&A example, the
model outputs a contrived has_sufficient_context_for_answer boolean, which can serve
as a confidence score of whether the answer is contained in the retrieved content.
Evaluations of this type can reduce retrieval-based hallucinations and enhance accuracy.
3. Autocomplete
Users can easily create a token highlighter using the built in tokenization that comes with
enabling logprobs . Additionally, the bytes parameter includes the ASCII encoding of each
output character, which is particularly useful for reproducing emojis and special characters."
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
def get_completion(
messages: list[dict[str, str]],
model: str = "gpt-4",
max_tokens=500,
temperature=0,
stop=None,
seed=123,
tools=None,
logprobs=None, # whether to return log probabilities of the output tokens or not. If true, retur
top_logprobs=None,
) -> str:
params = {
"model": model,
"messages": messages,
"max_tokens": max_tokens,
"temperature": temperature,
"stop": stop,
"seed": seed,
"logprobs": logprobs,
"top_logprobs": top_logprobs,
}
if tools:
params["tools"] = tools
completion = client.chat.completions.create(**params)
return completion
Let's say we want to create a system to classify news articles into a set of pre-defined categories.
Without logprobs , we can use Chat Completions to do this, but it is much more difficult to
assess the certainty with which the model made its classifications.
Now, with logprobs enabled, we can see exactly how confident the model is in its predictions,
which is crucial for creating an accurate and trustworthy classifier. For example, if the log
probability for the chosen category is high, this suggests the model is quite confident in its
classification. If it's low, this suggests the model is less confident. This can be particularly useful
in cases where the model's classification is not what you expected, or when the model's output
needs to be reviewed or validated by a human.
We'll begin with a prompt that presents the model with four categories: Technology, Politics,
Sports, and Arts. The model is then tasked with classifying articles into these categories based
solely on their headlines.
CLASSIFICATION_PROMPT = """You will be given a headline of a news article.
Classify the article into one of the following categories: Technology, Politics, Sports, and Art.
Return only the name of the category, and nothing else.
MAKE SURE your output is one of the four categories stated.
Article headline: {headline}"""
Let's look at three sample headlines, and first begin with a standard Chat Completions output,
without logprobs
headlines = [
"Tech Giant Unveils Latest Smartphone Model with Advanced Photo-Editing Features.",
"Local Mayor Launches Initiative to Enhance Urban Public Transport.",
"Tennis Champion Showcases Hidden Talents in Symphony Orchestra Debut",
]
Headline: Tech Giant Unveils Latest Smartphone Model with Advanced Photo-Editing Features.
Category: Technology
Here we can see the selected category for each headline. However, we have no visibility into the
confidence of the model in its predictions. Let's rerun the same prompt but with logprobs
enabled, and top_logprobs set to 2 (this will show us the 2 most likely output tokens for each
token). Additionally we can also output the linear probability of each output token, in order to
convert the log probability to the more easily interprable scale of 0-100%.
for headline in headlines:
print(f"\nHeadline: {headline}")
API_RESPONSE = get_completion(
[{"role": "user", "content": CLASSIFICATION_PROMPT.format(headline=headline)}],
model="gpt-4",
logprobs=True,
top_logprobs=2,
)
top_two_logprobs = API_RESPONSE.choices[0].logprobs.content[0].top_logprobs
html_content = ""
for i, logprob in enumerate(top_two_logprobs, start=1):
html_content += (
f"<span style='color: cyan'>Output token {i}:</span> {logprob.token}, "
f"<span style='color: darkorange'>logprobs:</span> {logprob.logprob}, "
f"<span style='color: magenta'>linear probability:</span> {np.round(np.exp(logprob.logpro
)
display(HTML(html_content))
print("\n")
Headline: Tech Giant Unveils Latest Smartphone Model with Advanced Photo-Editing Features.
As expected from the first two headlines, gpt-4 is nearly 100% confident in its classifications, as
the content is clearly technology and politics focused respectively. However, the third headline
combines both sports and art-related themes, so we see the model is less confident in its
selection.
This shows how important using logprobs can be, as if we are using LLMs for classification
tasks we can set confidence theshholds, or output several potential output tokens if the log
probability of the selected output is not sufficiently high. For instance, if we are creating a
recommendation engine to tag articles, we can automatically classify headlines crossing a
certain threshold, and send the less certain headlines for manual review.
To reduce hallucinations, and the performance of our RAG-based Q&A system, we can use
logprobs to evaluate how confident the model is in its retrieval.
Let's say we have built a retrieval system using RAG for Q&A, but are struggling with
hallucinated answers to our questions. Note: we will use a hardcoded article for this example,
but see other entries in the cookbook for tutorials on using RAG for Q&A.
# Article retrieved
ada_lovelace_article = """Augusta Ada King, Countess of Lovelace (née Byron; 10 December 1815 – 27 No
Ada Byron was the only legitimate child of poet Lord Byron and reformer Lady Byron. All Lovelace's ha
Her educational and social exploits brought her into contact with scientists such as Andrew Crosse, C
When she was eighteen, her mathematical talents led her to a long working relationship and friendship
Between 1842 and 1843, Ada translated an article by the military engineer Luigi Menabrea (later Prime
Lovelace's notes are important in the early history of computers, especially since the seventh one co
"""
Now, what we can do is ask the model to respond to the question, but then also evaluate its
response. Specifically, we will ask the model to output a boolean
has_sufficient_context_for_answer . We can then evaluate the logprobs to see just how
confident the model is that its answer was contained in the provided context
PROMPT = """You retrieved this article: {article}. The question is: {question}.
Before even answering the question, consider whether you have sufficient information in the article t
Your output should JUST be the boolean true or false, of if you have sufficient information in the ar
Respond with just one word, the boolean true or false. You must output the word 'True', or the word
"""
html_output = ""
html_output += "Questions clearly answered in article"
display(HTML(html_output))
For the first two questions, our model asserts with (near) 100% confidence that the article has
sufficient context to answer the posed questions. On the other hand, for the more tricky
questions which are less clearly answered in the article, the model is less confident that it has
sufficient context. This is a great guardrail to help ensure our retrieved content is sufficient. This
self-evaluation can help reduce hallucinations, as you can restrict answers or re-prompt the user
when your sufficient_context_for_answer log probability is below a certain threshold.
Methods like this have been shown to significantly reduce RAG for Q&A hallucinations and
errors (Example)
3. Autocomplete
Another use case for logprobs are autocomplete systems. Without creating the entire
autocomplete system end-to-end, let's demonstrate how logprobs could help us decide how
to suggest words as a user is typing.
First, let's come up with a sample sentence: "My least favorite TV show is Breaking Bad."
Let's say we want it to dynamically recommend the next word or token as we are typing the
sentence, but only if the model is quite sure of what the next word will be. To demonstrate this,
let's break up the sentence into sequential components.
sentence_list = [
"My",
"My least",
"My least favorite",
"My least favorite TV",
"My least favorite TV show",
"My least favorite TV show is",
"My least favorite TV show is Breaking Bad",
]
Now, we can ask gpt-3.5-turbo to act as an autocomplete engine with whatever context the
model is given. We can enable logprobs and can see how confident the model is in its
prediction.
high_prob_completions = {}
low_prob_completions = {}
html_output = ""
display(HTML(html_output))
Sentence: My
Sentence: My least
high_prob_completions
These look reasonable! We can feel confident in those suggestions. It's pretty likely you want to
write 'show' after writing 'My least favorite TV'! Now let's look at the autocompletion
suggestions the model was less confident about:
low_prob_completions
{'My least favorite': 'food', 'My least favorite TV show is': '"My'}
These are logical as well. It's pretty unclear what the user is going to say with just the prefix 'my
least favorite', and it's really anyone's guess what the author's favorite TV show is. So, using
gpt-3.5-turbo , we can create the root of a dynamic autocompletion engine with logprobs !
Let's quickly touch on creating a simple token highlighter with logprobs , and using the bytes
parameter. First, we can create a function that counts and highlights each token. While this
doesn't use the log probabilities, it uses the built in tokenization that comes with enabling
logprobs .
PROMPT = """What's the longest word in the English language?"""
API_RESPONSE = get_completion(
[{"role": "user", "content": PROMPT}], model="gpt-4", logprobs=True, top_logprobs=5
)
def highlight_text(api_response):
colors = [
"#FF00FF", # Magenta
"#008000", # Green
"#FF8C00", # Dark Orange
"#FF0000", # Red
"#0000FF", # Blue
]
tokens = api_response.choices[0].logprobs.content
highlight_text(API_RESPONSE)
The longest word in the English language, according to the Guinness World Records, is
'pneumonoultramicroscopicsilicovolcanoconiosis'. It is a type of lung disease caused by
inhaling ash and sand dust.
Next, let's reconstruct a sentence using the bytes parameter. With logprobs enabled, we are
given both each token and the ASCII (decimal utf-8) values of the token string. These ASCII
values can be helpful when handling tokens of or containing emojis or special characters.
# Assert that the decoded text is the same as the message content
assert API_RESPONSE.choices[0].message.content == aggregated_text
Token: \xf0\x9f\x92
Log prob: -0.0003056686
Linear prob: 99.97 %
Bytes: [240, 159, 146]
Token: \x99
Log prob: 0.0
Linear prob: 100.0 %
Bytes: [153]
Token: -
Log prob: -0.0096905725
Linear prob: 99.04 %
Bytes: [32, 45]
Token: Blue
Log prob: -0.00042042506
Linear prob: 99.96 %
Bytes: [32, 66, 108, 117, 101]
Token: Heart
Log prob: -7.302705e-05
Linear prob: 99.99 %
Bytes: [32, 72, 101, 97, 114, 116]
Decoded bytes: 💙
Bytes array: [240, 159, 146, 153, 32, 45, 32, 66, 108, 117, 101, 32, 72, 101, 97, 114, 116]
- Blue Heart
Joint prob: 98.96 %
Here, we see that while the first token was \xf0\x9f\x92' , we can get its ASCII value and
append it to a bytes array. Then, we can easily decode this array into a full sentence, and
validate with our assert statement that the decoded bytes is the same as our completion
message!
Additionally, we can get the joint probability of the entire completion, which is the
exponentiated product of each token's log probability. This gives us how likely this given
completion is given the prompt. Since, our prompt is quite directive (asking for a certain emoji
and its name), the joint probability of this output is high! If we ask for a random output
however, we'll see a much lower joint probability. This can also be a good tactic for developers
during prompt engineering.
5. Conclusion
Nice! We were able to use the logprobs parameter to build a more robust classifier, evaluate
our retrieval for Q&A system, and encode and decode each 'byte' of our tokens! logprobs
adds useful information and signal to our completions output, and we are excited to see how
developers incorporate it to improve applications.
6. Possible extensions
There are many other use cases for logprobs that are not covered in this cookbook. We can
use logprobs for:
Moderation
Keyword selection
Token healing
and more!
Cookbook About API Docs Contribute
This notebook illustrates the use of the new Assistants API (GPT-4), and DALL·E-3 in crafting
informative and visually appealing slides. Creating slides is a pivotal aspect of many jobs, but
can be laborious and time-consuming. Additionally, extracting insights from data and
articulating them effectively on slides can be challenging. This cookbook recipe will
demonstrate how you can utilize the new Assistants API to faciliate the end to end slide creation
process for you without you having to touch Microsoft PowerPoint or Google Slides, saving you
valuable time and effort!
0. Setup
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
client.beta.threads.messages.create(
**params
)
return client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant_id,
)
def get_response(thread):
return client.beta.threads.messages.list(thread_id=thread.id)
In this recipe, we will be creating a brief fictional presentation for the quarterly financial review
of our company, NotReal Corporation. We want to highlight some key trends we are seeing that
are affecting the profitability of our company. Let's say we have the some financial data at our
disposal. Let's load in the data, and take a look...
financial_data_path = 'data/NotRealCorp_financial_data.json'
financial_data = pd.read_json(financial_data_path)
financial_data.head(5)
Year Quarter Distribution channel Revenue ($M) Costs ($M) Customer count Time
As you can see, this data has quarterly revenue, costs and customer data across different
distribution channels. Let's create an Assistant that can act as a personal analyst and make a
nice visualization for our PowerPoint!
First, we need to upload our file so our Assistant can access it.
file = client.files.create(
file=open('data/NotRealCorp_financial_data.json',"rb"),
purpose='assistants',
)
Now, we're ready to create our Assistant. We can instruct our assistant to act as a data scientist,
and take any queries we give it and run the necessary code to output the proper data
visualization. The instructions parameter here is akin to system instructions in the
ChatCompletions endpoint, and can help guide the assistant. We can also turn on the tool of
Code Interpreter, so our Assistant will be able to code. Finally, we can specifiy any files we want
to use, which in this case is just the financial_data file we created above.
assistant = client.beta.assistants.create(
instructions="You are a data scientist assistant. When given data and a query, write the proper cod
model="gpt-4-1106-preview",
tools=[{"type": "code_interpreter"}],
file_ids=[file.id]
)
Let's create a thread now, and as our first request ask the Assistant to calculate quarterly profits,
and then plot the profits by distribution channel over time. The assistant will automatically
calculate the profit for each quarter, and also create a new column combining quarter and year,
without us having to ask for that directly. We can also specify the colors of each line.
thread = client.beta.threads.create(
messages=[
{
"role": "user",
"content": "Calculate profit (revenue minus cost) by quarter and year, and visualize as a line
"file_ids": [file.id]
}
]
)
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)
We can now start a loop that will check if the image has been created. Note: This may take a few
minutes
messages = client.beta.threads.messages.list(thread_id=thread.id)
import time
while True:
messages = client.beta.threads.messages.list(thread_id=thread.id)
try:
#See if image has been created
messages.data[0].content[0].image_file
#Sleep to make sure run has completed
time.sleep(5)
print('Plot created!')
break
except:
time.sleep(10)
print('Assistant still working...')
messages = client.beta.threads.messages.list(thread_id=thread.id)
[message.content[0] for message in messages.data]
[MessageContentImageFile(image_file=ImageFile(file_id='file-0rKABLygI02MgwwhpgWdRFY1'), type='i
MessageContentText(text=Text(annotations=[], value="The profit has been calculated for each di
MessageContentText(text=Text(annotations=[], value="The JSON data has been successfully restru
MessageContentText(text=Text(annotations=[], value='The structure of the JSON data shows that
MessageContentText(text=Text(annotations=[], value="The JSON data has been incorrectly loaded
MessageContentText(text=Text(annotations=[], value="It seems that the file content was success
MessageContentText(text=Text(annotations=[], value="It appears that the content of the datafra
MessageContentText(text=Text(annotations=[], value="Before we can calculate profits and visual
MessageContentText(text=Text(annotations=[], value='Calculate profit (revenue minus cost) by q
We can see that the last message (latest message is shown first) from the assistant contains the
image file we are looking for. An interesting note here is that the Assistant was able to attempt
several times to parse the JSON data, as the first parsing was unsuccessful, demonstrating the
assistant's adaptability.
plot_file_id = messages.data[0].content[0].image_file.file_id
image_path = "../images/NotRealCorp_chart.png"
convert_file_to_png(plot_file_id,image_path)
#Upload
plot_file = client.files.create(
file=open(image_path, "rb"),
purpose='assistants'
)
2. Generating insights
To get insights from our image, we simply need to add a new message to our thread. Our
Assistant will know to use the message history to give us some concise takeaways from the
visual provided.
Now, once the run has completed, we can see the latest message
# Hard coded wait for a response, as the assistant may iterate on the bullets.
time.sleep(10)
response = get_response(thread)
bullet_points = response.data[0].content[0].text.value
print(bullet_points)
The plot reveals a consistent upward trend in profits for all distribution channels, indicating
Cool! So our assistant was able to identify the noteworthy growth in Online Sales profit, and
infer that this shows the importance of a large digital presence. Now let's get a compelling title
for the slide.
Nice, now we have a title, a plot and two bullet points. We're almost ready to put this all on a
slide, but as a final step, let's have DALL·E-3 come up with an image to use as the title slide of
the presentation. Note: DALL·E-3 is not yet available within the assistants API but is coming
soon! We'll feed in a brief description of our company (NotRealCorp) and have DALL·E-3 do the
rest!
company_summary = "NotReal Corp is a prominent hardware company that manufactures and sells processor
response = client.images.generate(
model='dall-e-3',
prompt=f"given this company summary {company_summary}, create an inspirational \
photo showing the growth and path forward. This will be used at a quarterly\
financial planning meeting",
size="1024x1024",
quality="hd",
n=1
)
image_url = response.data[0].url
Cool, now we can add this image to our thread. First, we can save the image locally, then upload
it to the assistants API using the File upload endpoint. Let's also take a look at our image
dalle_img_path = '../images/dalle_image.png'
img = requests.get(image_url)
#Save locally
with open(dalle_img_path,'wb') as file:
file.write(img.content)
#Upload
dalle_file = client.files.create(
file=open(dalle_img_path, "rb"),
purpose='assistants'
)
4. Creating the slides
We now have all the content we need to create the slides. While we could simply add a message
asking for slides, but let's instead give the assistant a slide template, using the python-pptx
library, to use. This will ensure we get a deck in the style we want. See the Extensions section
at the end of the notebook for notes on creating the template.
title_template = """
from pptx import Presentation
from pptx.util import Inches, Pt
from pptx.enum.text import PP_PARAGRAPH_ALIGNMENT
from pptx.dml.color import RGBColor
# Add image to the left side of the slide with a margin at the top and bottom
left = Inches(0)
top = Inches(0)
height = prs.slide_height
width = prs.slide_width * 3/5
pic = slide.shapes.add_picture(image_path, left, top, width=width, height=height)
data_vis_template = """
from pptx import Presentation
from pptx.util import Inches, Pt
from pptx.enum.text import PP_PARAGRAPH_ALIGNMENT
from pptx.dml.color import RGBColor
# Create a new presentation object
prs = Presentation()
# Define placeholders
image_path = data_vis_img
title_text = "Maximizing Profits: The Dominance of Online Sales & Direct Sales Optimization"
bullet_points = "• Online Sales consistently lead in profitability across quarters, indicating a stro
bullet_p = insights_frame.add_paragraph()
bullet_p.text = bullet_points
bullet_p.font.size = Pt(12)
bullet_p.font.color.rgb = RGBColor(255, 255, 255)
bullet_p.line_spacing = 1.5
"""
Let's set a few quick variables for our slides. We want the company name, NotRealCorp, to be
on the title slide, and the title of the presentation should 'Quartlerly financial planning metting,
Q3, 2023'.
title_text = "NotRealCorp"
subtitle_text = "Quarterly financial planning meeting, Q3 2023"
Here we have a template to create a Title Slide. The template below was created by uploading
the image of a desirable title slide to GPT-V, and asking for the python-pptx code to create
that template. The inputs to the template are the image_path, title_text, and subtitle_text.
submit_message(assistant.id,thread,f"Use the included code template to create a PPTX slide that follo
{title_template}. IMPORTANT: Use the image file included in this message as the image_path image in t
use the subtitle_text {subtitle_text} a the subtitle_text variable. \
NEST, create a SECOND slide using the following code template: {data_vis_template} to create a PP
{data_vis_template}. IMPORTANT: Use the line plot image, that is the second attached image in this me
the bullet points of insights you created earlier for the bullet_points variable. Output these TWO
file_ids=[dalle_file.id, plot_file.id]
)
pptx_id = response.data[0].content[0].text.annotations[0].file_path.file_id
ppt_file= client.files.content(pptx_id)
file_obj = io.BytesIO(ppt_file.read())
with open("data/created_slides.pptx", "wb") as f:
f.write(file_obj.getbuffer())
Now, we have a PPTX file saved with all of our created content!.
Let's look at the screenshots of the .pptx we just created using JUST the assistants API and
DALL·E-3. We don't have a seed parameter yet in the Assistants API, so the DALL·E-3 image and
wordings will be slightly different from what you see when you run this notebook, due to the
non-determinism of LLMs, but the outputs should be directionally the same.
Woo! While these slides could use some formatting tweaks, we have made some great content
using the Assistants API, GPT-4 and DALL·E-3. We were able to take a .csv file with financial
data, and use our assisant to calculate profit by quarter across distribution channels, plot the
results, identify insights and key takeaways from the visualization, and create a summarative
title. And, given just a description of our company, NotRealCorp, we used DALL·E-3 to make an
awesome title image. While we are still a ways away from entirely automating this process
without a human in the loop, hopefully this notebook can make the slide creation process a bit
easier for you. More importantly, this notebook can ideally give you a glimpse into the potential
of the assistants API! We're excited to see what you build.
6. Extensions
When DALL·E-3 is incorporated in the Assistants API, we will have the ability to request the
generated title image within the thread.
GPT-4-Vision is not yet supported in the Assistants API, but could have been used to gather
insights from the line plot image.
GPT-4-Vision was used to generate the python-pptx template included in this recipe, so a
potential extension project could be demonstrating best practices around converting
images to slide templates.
Cookbook About API Docs Contribute
This notebook covers use cases where your data is unlabelled but has features that can be used
to cluster them into meaningful categories. The challenge with clustering is making the features
that make those clusters stand out human-readable, and that is where we'll look to use GPT-3
to generate meaningful cluster descriptions for us. We can then use these to apply labels to a
previously unlabelled dataset.
To feed the model we use embeddings created using the approach displayed in the notebook
Multiclass classification for transactions Notebook, applied to the full 359 transactions in the
dataset to give us a bigger pool for learning
Setup
True
# imports
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
COMPLETIONS_MODEL = "gpt-3.5-turbo"
Clustering
We'll reuse the approach from the Clustering Notebook, using K-Means to cluster our dataset
using the feature embeddings we created previously. We'll then use the Completions endpoint
to generate cluster descriptions for us and judge their effectiveness
df = pd.read_csv(embedding_path)
df.head()
Transaction
Date Supplier Description value (£) combined n_tokens embedding
embedding_df = pd.read_csv(embedding_path)
embedding_df["embedding"] = embedding_df.embedding.apply(literal_eval).apply(np.array)
matrix = np.vstack(embedding_df.embedding.values)
matrix.shape
(359, 1536)
n_clusters = 5
tsne = TSNE(
n_components=2, perplexity=15, random_state=42, init="random", learning_rate=200
)
vis_dims2 = tsne.fit_transform(matrix)
x = [x for x, y in vis_dims2]
y = [y for x, y in vis_dims2]
avg_x = xs.mean()
avg_y = ys.mean()
for i in range(n_clusters):
print(f"Cluster {i} Theme:\n")
transactions = "\n".join(
embedding_df[embedding_df.Cluster == i]
.combined.str.replace("Supplier: ", "")
.str.replace("Description: ", ": ")
.str.replace("Value: ", ": ")
.sample(transactions_per_cluster, random_state=42)
.values
)
response = client.chat.completions.create(
model=COMPLETIONS_MODEL,
# We'll include a prompt to instruct the model what sort of description we're looking for
messages=[
{"role": "user",
"content": f'''We want to group these transactions into meaningful clusters so we can ta
What do the following transactions have in common?\n\nTransactions:\n"""\n{transactio
],
temperature=0,
max_tokens=100,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
)
print(response.choices[0].message.content.replace("\n", ""))
print("\n")
print("-" * 100)
print("\n")
Cluster 0 Theme:
The common theme among these transactions is that they all involve spending money on various ex
Cluster 1 Theme:
The common theme among these transactions is that they all involve payments for various goods a
Institute of Conservation, This payment covers 2 invoices for student bursary costs
PRIVATE SALE, Collection of papers of an individual
LEE BOYD LIMITED, Architectural Works
ALDL, Legal Deposit Services
RICK GEKOSKI, Papers 1970's to 2019 Alisdair Gray
l b l i d i l j i
Conclusion
We now have five new clusters that we can use to describe our data. Looking at the visualisation
some of our clusters have some overlap and we'll need some tuning to get to the right place,
but already we can see that GPT-3 has made some effective inferences. In particular, it picked
up that items including legal deposits were related to literature archival, which is true but the
model was given no clues on. Very cool, and with some tuning we can create a base set of
clusters that we can then use with a multiclass classifier to generalise to other transactional
datasets we might use.
Cookbook About API Docs Contribute
This notebook covers how to use the Chat Completions API in combination with external
functions to extend the capabilities of GPT models.
tools is an optional parameter in the Chat Completion API which can be used to provide
function specifications. The purpose of this is to enable models to generate function arguments
which adhere to the provided specifications. Note that the API will not actually execute any
function calls. It is up to developers to execute function calls using model outputs.
Within the tools parameter, if the functions parameter is provided then by default the
model will decide when it is appropriate to use one of the functions. The API can be forced to
use a specific function by setting the tool_choice parameter to {"name": "<insert-function-
name>"} . The API can also be forced to not use any function by setting the tool_choice
parameter to "none" . If a function is used, the output will contain "finish_reason":
"function_call" in the response, as well as a tool_choice object that has the name of the
function and the generated function arguments.
Overview
This notebook contains the following 2 sections:
How to generate function arguments: Specify a set of functions and use the API to
generate function arguments.
How to call functions with model generated arguments: Close the loop by actually
executing functions with model generated arguments.
import json
from openai import OpenAI
from tenacity import retry, wait_random_exponential, stop_after_attempt
from termcolor import colored
GPT_MODEL = "gpt-3.5-turbo-0613"
client = OpenAI()
Utilities
First let's define a few utilities for making calls to the Chat Completions API and for maintaining
and keeping track of the conversation state.
def pretty_print_conversation(messages):
role_to_color = {
"system": "red",
"user": "green",
"assistant": "blue",
"function": "magenta",
}
Basic concepts
Let's create some function specifications to interface with a hypothetical weather API. We'll pass
these function specification to the Chat Completions API in order to generate function
arguments that adhere to the specification.
tools = [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users locati
},
},
"required": ["location", "format"],
},
}
},
{
"type": "function",
"function": {
"name": "get_n_day_weather_forecast",
"description": "Get an N-day weather forecast",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users locati
},
"num_days": {
"type": "integer",
"description": "The number of days to forecast",
}
},
"required": ["location", "format", "num_days"]
},
}
},
]
If we prompt the model about the current weather, it will respond with some clarifying
questions.
messages = []
messages.append({"role": "system", "content": "Don't make assumptions about what values to plug into
messages.append({"role": "user", "content": "What's the weather like today"})
chat_response = chat_completion_request(
messages, tools=tools
)
assistant_message = chat_response.choices[0].message
messages.append(assistant_message)
assistant_message
ChatCompletionMessage(content='Sure, I can help you with that. Could you please provide me with
Once we provide the missing information, it will generate the appropriate function arguments
for us.
By prompting it differently, we can get it to target the other function we've told it about.
messages = []
messages.append({"role": "system", "content": "Don't make assumptions about what values to plug into
messages.append({"role": "user", "content": "what is the weather going to be like in Glasgow, Scotlan
chat_response = chat_completion_request(
messages, tools=tools
)
assistant_message = chat_response.choices[0].message
messages.append(assistant_message)
assistant_message
ChatCompletionMessage(content='Sure! Please provide the number of days you would like to know t
Once again, the model is asking us for clarification because it doesn't have enough information
yet. In this case it already knows the location for the forecast, but it needs to know how many
days are required in the forecast.
We can force the model to use a specific function, for example get_n_day_weather_forecast by
using the function_call argument. By doing so, we force the model to make assumptions about
how to use it.
We can also force the model to not use a function at all. By doing so we prevent it from
producing a proper function call.
messages = []
messages.append({"role": "system", "content": "Don't make assumptions about what values to plug into
messages.append({"role": "user", "content": "Give me the current weather (use Celcius) for Toronto, C
chat_response = chat_completion_request(
messages, tools=tools, tool_choice="none"
)
chat_response.choices[0].message
Newer models like gpt-4-1106-preview or gpt-3.5-turbo-1106 can call multiple functions in one
turn.
messages = []
messages.append({"role": "system", "content": "Don't make assumptions about what values to plug into
messages.append({"role": "user", "content": "what is the weather going to be like in San Francisco an
chat_response = chat_completion_request(
messages, tools=tools, model='gpt-3.5-turbo-1106'
)
assistant_message = chat_response.choices[0].message.tool_calls
assistant_message
[ChatCompletionMessageToolCall(id='call_q8k4geh0uGPRtIfOXYPB0yM8', function=Function(arguments=
ChatCompletionMessageToolCall(id='call_Hdl7Py7aLswCBPptrD4y5BD3', function=Function(arguments=
In our next example, we'll demonstrate how to execute functions whose inputs are model-
generated, and use this to implement an agent that can answer questions for us about a
database. For simplicity we'll use the Chinook sample database.
Note: SQL generation can be high-risk in a production environment since models are not
perfectly reliable at generating correct SQL.
import sqlite3
conn = sqlite3.connect("data/Chinook.db")
print("Opened database successfully")
def get_table_names(conn):
"""Return a list of table names."""
table_names = []
tables = conn.execute("SELECT name FROM sqlite_master WHERE type='table';")
for table in tables.fetchall():
table_names.append(table[0])
return table_names
def get_database_info(conn):
"""Return a list of dicts containing the table name and columns for each table in the database.""
table_dicts = []
for table_name in get_table_names(conn):
columns_names = get_column_names(conn, table_name)
table_dicts.append({"table_name": table_name, "column_names": columns_names})
return table_dicts
Now can use these utility functions to extract a representation of the database schema.
database_schema_dict = get_database_info(conn)
database_schema_string = "\n".join(
[
f"Table: {table['table_name']}\nColumns: {', '.join(table['column_names'])}"
for table in database_schema_dict
]
)
As before, we'll define a function specification for the function we'd like the API to generate
arguments for. Notice that we are inserting the database schema into the function specification.
This will be important for the model to know about.
tools = [
{
"type": "function",
"function": {
"name": "ask_database",
"description": "Use this function to answer user questions about music. Input should be a
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": f"""
SQL query extracting info to answer the user's question.
SQL should be written using this database schema:
{database_schema_string}
The query should be returned in plain text, not in JSON.
""",
}
},
"required": ["query"],
},
}
}
]
Now let's implement the function that will actually excute queries against the database.
def execute_function_call(message):
if message.tool_calls[0].function.name == "ask_database":
query = json.loads(message.tool_calls[0].function.arguments)["query"]
results = ask_database(conn, query)
else:
results = f"Error: function {message.tool_calls[0].function.name} does not exist"
return results
messages = []
messages.append({"role": "system", "content": "Answer user questions by generating SQL queries agains
messages.append({"role": "user", "content": "Hi, who are the top 5 artists by number of tracks?"})
chat_response = chat_completion_request(messages, tools)
assistant_message = chat_response.choices[0].message
assistant_message.content = str(assistant_message.tool_calls[0].function)
messages.append({"role": assistant_message.role, "content": assistant_message.content})
if assistant_message.tool_calls:
results = execute_function_call(assistant_message)
messages.append({"role": "function", "tool_call_id": assistant_message.tool_calls[0].id, "name":
pretty_print_conversation(messages)
[31msystem: Answer user questions by generating SQL queries against the Chinook Music Database
[0m
[32muser: Hi, who are the top 5 artists by number of tracks?
[0m
[34massistant: Function(arguments='{\n "query": "SELECT artist.Name, COUNT(track.TrackId) AS
[0m
[35mfunction (ask_database): [('Iron Maiden', 213), ('U2', 135), ('Led Zeppelin', 114), ('Meta
[0m
messages.append({"role": "user", "content": "What is the name of the album with the most tracks?"})
chat_response = chat_completion_request(messages, tools)
assistant_message = chat_response.choices[0].message
assistant_message.content = str(assistant_message.tool_calls[0].function)
messages.append({"role": assistant_message.role, "content": assistant_message.content})
if assistant_message.tool_calls:
results = execute_function_call(assistant_message)
messages.append({"role": "function", "tool_call_id": assistant_message.tool_calls[0].id, "name":
pretty_print_conversation(messages)
[31msystem: Answer user questions by generating SQL queries against the Chinook Music Database
[0m
[32muser: Hi, who are the top 5 artists by number of tracks?
[0m
[34massistant: Function(arguments='{\n "query": "SELECT artist.Name, COUNT(track.TrackId) AS
[0m
[35mfunction (ask_database): [('Iron Maiden', 213), ('U2', 135), ('Led Zeppelin', 114), ('Meta
[0m
[32muser: What is the name of the album with the most tracks?
[0m
[34massistant: Function(arguments='{\n "query": "SELECT album.Title, COUNT(track.TrackId) AS
[0m
[35mfunction (ask_database): [('Greatest Hits', 57)]
[0m
Next Steps
See our other notebook that demonstrates how to use the Chat Completions API and functions
for knowledge retrieval to interact conversationally with a knowledge base.
Cookbook About API Docs Contribute
GPT excels at answering questions, but only on topics it remembers from its training data.
What should you do if you want GPT to answer questions about unfamiliar topics? E.g.,
etc.
This notebook demonstrates a two-step Search-Ask method for enabling GPT to answer
questions using a library of reference text.
2. Ask: insert the retrieved text sections into a message to GPT and ask it the question
Via model inputs (i.e., insert the knowledge into an input message)
Although fine-tuning can feel like the more natural option—training on data is how GPT learned
all of its other knowledge, after all—we generally do not recommend it as a way to teach the
model knowledge. Fine-tuning is better suited to teaching specialized tasks or styles, and is less
reliable for factual recall.
As an analogy, model weights are like long-term memory. When you fine-tune a model, it's like
studying for an exam a week away. When the exam arrives, the model may forget details, or
misremember facts it never read.
In contrast, message inputs are like short-term memory. When you insert knowledge into a
message, it's like taking an exam with open notes. With notes in hand, the model is more likely
to arrive at correct answers.
One downside of text search relative to fine-tuning is that each model is limited by a maximum
amount of text it can read at once:
(New model is available with longer contexts, gpt-4-1106-preview have 128K context window)
Continuing the analogy, you can think of the model like a student who can only look at a few
pages of notes at a time, despite potentially having shelves of textbooks to draw upon.
Therefore, to build a system capable of drawing upon large quantities of text to answer
questions, we recommend using a Search-Ask approach.
Search
Lexical-based search
Graph-based search
Embedding-based search
This example notebook uses embedding-based search. Embeddings are simple to implement
and work especially well with questions, as questions often don't lexically overlap with their
answers.
Consider embeddings-only search as a starting point for your own system. Better search
systems might combine multiple search methods, along with features like popularity, recency,
user history, redundancy with prior search results, click rate data, etc. Q&A retrieval
performance may also be improved with techniques like HyDE, in which questions are first
transformed into hypothetical answers before being embedded. Similarly, GPT can also
potentially improve search results by automatically transforming questions into sets of
keywords or search terms.
Full procedure
1. Collect: We'll download a few hundred Wikipedia articles about the 2022 Olympics
2. Chunk: Documents are split into short, mostly self-contained sections to be embedded
4. Store: Embeddings are saved (for large datasets, use a vector database)
1. Given a user question, generate an embedding for the query from the OpenAI API
2. Using the embeddings, rank the text sections by relevance to the query
1. Insert the question and the most relevant sections into a message to GPT
Costs
Because GPT is more expensive than embeddings search, a system with a decent volume of
queries will have its costs dominated by step 3.
For gpt-3.5-turbo using ~1,000 tokens per query, it costs ~$0.002 per query, or ~500
queries per dollar (as of Apr 2023)
For gpt-4 , again assuming ~1,000 tokens per query, it costs ~$0.03 per query, or ~30
queries per dollar (as of Apr 2023)
Of course, exact costs will depend on the system specifics and usage patterns.
Preamble
# imports
import ast # for converting embeddings saved as strings back to arrays
from openai import OpenAI # for calling the OpenAI API
import pandas as pd # for storing text and embeddings data
import tiktoken # for counting tokens
import os # for getting API token from env variable OPENAI_API_KEY
from scipy import spatial # for calculating vector similarities for search
# models
EMBEDDING_MODEL = "text-embedding-ada-002"
GPT_MODEL = "gpt-3.5-turbo"
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
If you need to install any of the libraries above, run pip install {library_name} in your
terminal.
(You can also do this in a notebook cell with !pip install openai or %pip install openai .)
After installing, restart the notebook kernel so the libraries can be loaded.
The OpenAI library will try to read your API key from the OPENAI_API_KEY environment variable.
If you haven't already, you can set this environment variable by following these instructions.
Because the training data for gpt-3.5-turbo and gpt-4 mostly ends in September 2021, the
models cannot answer questions about more recent events, such as the 2022 Winter Olympics.
For example, let's try asking 'Which athletes won the gold medal in curling in 2022?':
response = client.chat.completions.create(
messages=[
{'role': 'system', 'content': 'You answer questions about the 2022 Winter Olympics.'},
{'role': 'user', 'content': query},
],
model=GPT_MODEL,
temperature=0,
)
print(response.choices[0].message.content)
As an AI language model, I don't have real-time data. However, I can provide you with general i
In this case, the model has no knowledge of 2022 and is unable to answer the question.
You can give GPT knowledge about a topic by inserting it into an input
message
To help give the model knowledge of curling at the 2022 Winter Olympics, we can copy and
paste the top half of a relevant Wikipedia article into our message:
Article
Talk
Read
Edit
View history
From Wikipedia, the free encyclopedia
Curling
at the XXIV Olympic Winter Games
Curling pictogram.svg
Curling pictogram
Venue Beijing National Aquatics Centre
Dates 2–20 February 2022
No. of events 3 (1 men, 1 women, 1 mixed)
Competitors 114 from 14 nations
← 20182026 →
Men's curling
at the XXIV Olympic Winter Games
Medalists
1st place, gold medalist(s) Sweden
2nd place, silver medalist(s) Great Britain
3rd place, bronze medalist(s) Canada
Women's curling
at the XXIV Olympic Winter Games
Medalists
1st place, gold medalist(s) Great Britain
2nd place, silver medalist(s) Japan
3rd place, bronze medalist(s) Sweden
Mixed doubles's curling
at the XXIV Olympic Winter Games
Medalists
1st place, gold medalist(s) Italy
2nd place, silver medalist(s) Norway
3rd place, bronze medalist(s) Sweden
Curling at the
2022 Winter Olympics
Curling pictogram.svg
Qualification
Statistics
Tournament
Men
Women
Mixed doubles
vte
The curling competitions of the 2022 Winter Olympics were held at the Beijing National Aquatics Centr
In each of the men's, women's, and mixed doubles competitions, 10 nations competed. The mixed doubles
Qualification
Main article: Curling at the 2022 Winter Olympics – Qualification
Qualification to the Men's and Women's curling tournaments at the Winter Olympics was determined thro
For the mixed doubles competition in 2022, the tournament field was expanded from eight competitor na
Summary
Nations Men Women Mixed doubles Athletes
Australia Yes 2
Canada Yes Yes Yes 12
China Yes Yes Yes 12
Czech Republic Yes 2
Denmark Yes Yes 10
Great Britain Yes Yes Yes 10
Italy Yes Yes 6
Japan Yes 5
Norway Yes Yes 6
ROC Yes Yes 10
South Korea Yes 5
Sweden Yes Yes Yes 11
Switzerland Yes Yes Yes 12
United States Yes Yes Yes 11
Total: 14 NOCs 10 10 10 114
Competition schedule
The Beijing National Aquatics Centre served as the venue of the curling competitions.
Curling competitions started two days before the Opening Ceremony and finished on the last day of the
Skip: Ma Xiuyue
Third: Zou Qiang
Second: Wang Zhiyu
Lead: Xu Jingtao
Alternate: Jiang Dongxu
Women
Canada China Denmark Great Britain Japan
Skip: Jennifer Jones
Third: Kaitlyn Lawes
Second: Jocelyn Peterman
Lead: Dawn McEwen
Alternate: Lisa Weagle
Skip: Han Yu
Third: Wang Rui
Second: Dong Ziqi
Lead: Zhang Lijun
Alternate: Jiang Xindi
Mixed doubles
Australia Canada China Czech Republic Great Britain
Female: Tahli Gill
Male: Dean Hewitt
query = f"""Use the below article on the 2022 Winter Olympics to answer the subsequent question. If t
Article:
\"\"\"
{wikipedia_article_on_curling}
\"\"\"
Question: Which athletes won the gold medal in curling at the 2022 Winter Olympics?"""
response = client.chat.completions.create(
messages=[
{'role': 'system', 'content': 'You answer questions about the 2022 Winter Olympics.'},
{'role': 'user', 'content': query},
],
model=GPT_MODEL,
temperature=0,
)
print(response.choices[0].message.content)
In the men's curling event, the gold medal was won by Sweden. In the women's curling event, the
Thanks to the Wikipedia article included in the input message, GPT answers correctly.
In this particular case, GPT was intelligent enough to realize that the original question was
underspecified, as there were three curling gold medal events, not just one.
Of course, this example partly relied on human intelligence. We knew the question was about
curling, so we inserted a Wikipedia article on curling.
The rest of this notebook shows how to automate this knowledge insertion with embeddings-
based search.
To save you the time & expense, we've prepared a pre-embedded dataset of a few hundred
Wikipedia articles about the 2022 Winter Olympics.
To see how we constructed this dataset, or to modify it yourself, see Embedding Wikipedia
articles for search.
text embedding
2 Lviv bid for the 2022 Winter Olympics\n\n==Ven... [-0.00915789045393467, -0.008366798982024193, ...
2. Search
Takes a user query and a dataframe with text & embedding columns
Uses distance between query embedding and text embeddings to rank the texts
# search function
def strings_ranked_by_relatedness(
query: str,
df: pd.DataFrame,
relatedness_fn=lambda x, y: 1 - spatial.distance.cosine(x, y),
top_n: int = 100
) -> tuple[list[str], list[float]]:
"""Returns a list of strings and relatednesses, sorted from most related to least."""
query_embedding_response = client.embeddings.create(
model=EMBEDDING_MODEL,
input=query,
)
query_embedding = query_embedding_response.data[0].embedding
strings_and_relatednesses = [
(row["text"], relatedness_fn(query_embedding, row["embedding"]))
for i, row in df.iterrows()
]
strings_and_relatednesses.sort(key=lambda x: x[1], reverse=True)
strings, relatednesses = zip(*strings_and_relatednesses)
return strings[:top_n], relatednesses[:top_n]
# examples
strings, relatednesses = strings_ranked_by_relatedness("curling gold medal", df, top_n=5)
for string, relatedness in zip(strings, relatednesses):
print(f"{relatedness=:.3f}")
display(string)
relatedness=0.879
relatedness=0.872
relatedness=0.869
'Curling at the 2022 Winter Olympics\n\n==Results summary==\n\n===Mixed doubles tournament===\n
relatedness 0 868
3. Ask
With the search function above, we can now automatically retrieve relevant knowledge and
insert it into messages to GPT.
def query_message(
query: str,
df: pd.DataFrame,
model: str,
token_budget: int
) -> str:
"""Return a message for GPT, with relevant source texts pulled from a dataframe."""
strings, relatednesses = strings_ranked_by_relatedness(query, df)
introduction = 'Use the below articles on the 2022 Winter Olympics to answer the subsequent quest
question = f"\n\nQuestion: {query}"
message = introduction
for string in strings:
next_article = f'\n\nWikipedia article section:\n"""\n{string}\n"""'
if (
num_tokens(message + next_article + question, model=model)
> token_budget
):
break
else:
message += next_article
return message + question
def ask(
query: str,
df: pd.DataFrame = df,
model: str = GPT_MODEL,
token_budget: int = 4096 - 500,
print_message: bool = False,
) -> str:
"""Answers a query using GPT and a dataframe of relevant texts and embeddings."""
message = query_message(query, df, model=model, token_budget=token_budget)
if print_message:
print(message)
messages = [
{"role": "system", "content": "You answer questions about the 2022 Winter Olympics."},
{"role": "user", "content": message},
]
response = client.chat.completions.create(
model=model,
messages=messages,
temperature=0
)
response_message = response.choices[0].message.content
return response_message
Example questions
Finally, let's ask our system our original question about gold medal curlers:
ask('Which athletes won the gold medal in curling at the 2022 Winter Olympics?')
"In the men's curling tournament, the gold medal was won by the team from Sweden, consisting of
Despite gpt-3.5-turbo having no knowledge of the 2022 Winter Olympics, our search system
was able to retrieve reference text for the model to read, allowing it to correctly list the gold
medal winners in the Men's and Women's tournaments.
However, it still wasn't quite perfect—the model failed to list the gold medal winners from the
Mixed doubles event.
In this particular case, looking at the text below, it looks like the #1 article given to the model
did contain medalists for all three events, but the later results emphasized the Men's and
Women's tournaments, which may have distracted the model from giving a more complete
answer.
# set print_message=True to see the source text GPT was working off of
ask('Which athletes won the gold medal in curling at the 2022 Winter Olympics?', print_message=True)
Use the below articles on the 2022 Winter Olympics to answer the subsequent question. If the an
==Curling==
Knowing that this mistake was due to imperfect reasoning in the ask step, rather than imperfect
retrieval in the search step, let's focus on improving the ask step.
The easiest way to improve results is to use a more capable model, such as GPT-4 . Let's try it.
ask('Which athletes won the gold medal in curling at the 2022 Winter Olympics?', model="gpt-4")
"The athletes who won the gold medal in curling at the 2022 Winter Olympics are:\n\nMen's tourn
GPT-4 succeeds perfectly, correctly identifying all 12 gold medal winners in curling.
More examples
Below are a few more examples of the system in action. Feel free to try your own questions, and
see how it does. In general, search-based systems do best on questions that have a simple
lookup, and worst on questions that require multiple partial sources to be combined and
reasoned about.
# counting question
ask('How many records were set at the 2022 Winter Olympics?')
# comparison question
ask('Did Jamaica or Cuba have more athletes at the 2022 Winter Olympics?')
"Jamaica had more athletes at the 2022 Winter Olympics. According to the provided information,
# subjective question
ask('Which Olympic sport is the most entertaining?')
"In the marsh, the Shoebill stands tall and stark,\nWith a grace that lights up the day's dark.
# misspelled question
ask('who winned gold metals in kurling at the olimpics')
"According to the provided information, the gold medal winners in curling at the 2022 Winter Ol
'COVID-19 had several impacts on the 2022 Winter Olympics. Here are some of the effects:\n\n1.
Retrieval Augmentation for GPT-4 using
Pinecone
James Briggs
Open in Github
Mar 23, 2023
In this notebook we will learn how to query relevant contexts to our queries from Pinecone, and
pass these to a GPT-4 model to generate an answer backed by real data sources.
GPT-4 is a big step up from previous OpenAI completion models. It also exclusively uses the
ChatCompletion endpoint, so we must use it in a slightly different way to usual. However, the
power of the model makes the change worthwhile, particularly when augmented with an
external knowledge base like the Pinecone vector database.
In this example, we will download the LangChain docs from langchain.readthedocs.io/. We get
all .html files located on the site like so:
<Response [200]>
This downloads all HTML into the rtdocs directory. Now we can use LangChain itself to
process these docs. We do this using the ReadTheDocsLoader like so:
loader = ReadTheDocsLoader('rtdocs')
docs = loader.load()
len(docs)
.rst .pdf Welcome to LangChain Contents Getting Started Modules Use Cases Reference Docs LangCh
This leaves us with hundreds of processed doc pages. Let's take a look at the format each one
contains:
docs[0]
print(docs[5].page_content)
docs[5].metadata['source'].replace('rtdocs/', 'https://')
data = []
data[3]
{'url': 'https://langchain.readthedocs.io/en/latest/modules/memory/types/entity_summary_memory.
'text': '.ipynb .pdf Entity Memory Contents Using in a chain Inspecting the memory store Entit
It's pretty ugly but it's good enough for now. Let's see how we can process all of these. We will
chunk everything into ~400 token chunks, we can do this easily with langchain and tiktoken :
import tiktoken
tokenizer = tiktoken.get_encoding('p50k_base')
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=400,
chunk_overlap=20,
length_function=tiktoken_len,
separators=["\n\n", "\n", " ", ""]
)
chunks = []
Our chunks are ready so now we move onto embedding and indexing everything.
We use text-embedding-3-small as the embedding model. We can embed text like so:
import openai
embed_model = "text-embedding-3-small"
res = openai.Embedding.create(
input=[
"Sample document text goes here",
"there will be several phrases in each batch"
], engine=embed_model
)
In the response res we will find a JSON-like object containing our new embeddings within the
'data' field.
res.keys()
Inside 'data' we will find two records, one for each of the two sentences we just embedded.
Each vector embedding contains 1536 dimensions (the output dimensionality of the text-
embedding-3-small model.
len(res['data'])
len(res['data'][0]['embedding']), len(res['data'][1]['embedding'])
(1536, 1536)
We will apply this same embedding logic to the langchain docs dataset we've just scraped. But
before doing so we must create a place to store the embeddings.
Now we need a place to store these embeddings and enable a efficient vector search through
them all. To do that we use Pinecone, we can get a free API key and enter it below where we
will initialize our connection to Pinecone and create a new index.
import pinecone
index_name = 'gpt-4-langchain-docs'
{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {},
'total_vector_count': 0}
We can see the index is currently empty with a total_vector_count of 0 . We can begin
populating it with OpenAI text-embedding-3-small built embeddings like so:
Now we've added all of our langchain docs to the index. With that we can move on to retrieval
and then answer generation using GPT-4.
Retrieval
To search through our documents we first need to create a query vector xq . Using xq we will
retrieve the most relevant chunks from the LangChain docs, like so:
res = openai.Embedding.create(
input=[query],
engine=embed_model
)
res
With retrieval complete, we move on to feeding these into GPT-4 to produce answers.
GPT-4 is currently accessed via the ChatCompletions endpoint of OpenAI. To add the
information we retrieved into the model, we need to pass it into our user prompts alongside our
original query. We can do that like so:
augmented_query = "\n\n---\n\n".join(contexts)+"\n\n-----\n\n"+query
print(augmented_query)
.rst .pdf Chains Chains# Using an LLM in isolation is fine for some simple applications, but ma
---
.rst .pdf LLMs LLMs# Large Language Models (LLMs) are a core component of LangChain. LangChain
---
.ipynb .pdf Getting Started Contents Why do we need chains? Query an LLM with the LLMChain Comb
---
chain first uses a LLM to construct the url to hit, then makes that request with the Requests w
---
Prompts: This includes prompt management, prompt optimization, and prompt serialization. LLMs:
-----
res = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": primer},
{"role": "user", "content": augmented_query}
]
)
display(Markdown(res['choices'][0]['message']['content']))
<IPython.core.display.Markdown object>
res = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": primer},
{"role": "user", "content": query}
]
)
display(Markdown(res['choices'][0]['message']['content']))
<IPython.core.display.Markdown object>
res = openai.ChatCompletion.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are Q&A bot. A highly intelligent system that answers user
{"role": "user", "content": query}
]
)
display(Markdown(res['choices'][0]['message']['content']))
<IPython.core.display.Markdown object>
Function-calling with an OpenAPI specification
Shyamal Anadkat, Simón Fishman
Open in Github
Oct 14, 2023
Much of the internet is powered by RESTful APIs. Giving GPT the ability to call them opens up a
world of possibilities. This notebook demonstrates how GPTs can be used to intelligently call
APIs. It leverages OpenAPI specifications and chained function calls.
The OpenAPI Specification (OAS) is a universally accepted standard for describing the details of
RESTful APIs in a format that machines can read and interpret. It enables both humans and
computers to understand the capabilities of a service, and it can be leveraged to show GPT how
to call APIs.
1. How to convert a sample OpenAPI specification into a list of function definitions for the
chat completions API.
2. How to use the chat completions API to intelligently invoke these functions based on user
instructions.
import os
import json
import jsonref
from openai import OpenAI
import requests
from pprint import pp
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
The example OpenAPI spec we use here was created using gpt-4 . We will transform this
sample spec into a set of function definitions that can be supplied to the chat completion API.
The model, based on the provided user instructions, generates a JSON object containing the
necessary arguments to call these functions.
Before we proceed, let's inspect this generated spec. OpenAPI specs include details about the
API's endpoints, the operations they support, the parameters they accept, the requests they can
handle, and the responses they return. The spec is defined in JSON format.
Retrieving an event by ID
Deleting an event by ID
Each operation in the spec has an operationId , which we will use as the function name when
we parse the spec into function specifications. The spec also includes schemas that define the
data types and structures of the parameters for each
Aboutoperation.
Cookbook API Docs Contribute
You can see the schema here:
display(openapi_spec)
required : True,
'schema': {'type': 'string'}}],
'responses': {'204': {'description': 'The event was deleted'}}},
'patch': {'summary': "Update an event's details by ID",
'operationId': 'updateEventDetails',
'parameters': [{'name': 'id',
'in': 'path',
'required': True,
'schema': {'type': 'string'}}],
'requestBody': {'required': True,
'content': {'application/json': {'schema': {'type': 'object',
'properties': {'name': {'type': 'string'},
'date': {'type': 'string', 'format': 'date-time'},
'location': {'type': 'string'}},
'required': ['name', 'date', 'location']}}}},
'responses': {'200': {'description': "The event's details were updated",
'content': {'application/json': {'schema': {'type': 'object',
'properties': {'id': {'type': 'string'},
'name': {'type': 'string'},
'date': {'type': 'string', 'format': 'date-time'},
'location': {'type': 'string'}},
'required': ['name', 'date', 'location']}}}}}}}},
'components': {'schemas': {'Event': {'type': 'object',
'properties': {'id': {'type': 'string'},
'name': {'type': 'string'},
'date': {'type': 'string', 'format': 'date-time'},
'location': {'type': 'string'}},
'required': ['name', 'date', 'location']}}}}
Now that we have a good understanding of the OpenAPI spec, we can proceed to parse it into
function specifications.
name : This corresponds to the operation identifier of the API endpoint as defined in the
OpenAPI specification.
For each of the endpoints defined in the schema, we need to do the following:
1. Resolve JSON references: In an OpenAPI specification, it's common to use JSON references
(also known as $ref) to avoid duplication. These references point to definitions that are
used in multiple places. For example, if multiple API endpoints return the same object
structure, that structure can be defined once and then referenced wherever it's needed. We
need to resolve and replace these references with the content they point to.
2. Extract a name for the functions: We will simply use the operationId as the function name.
Alternatively, we could use the endpoint path and operation as the function name.
3. Extract a description and parameters: We will iterate through the description , summary ,
requestBody and parameters fields to populate the function's description and
parameters.
def openapi_to_functions(openapi_spec):
functions = []
req_body = (
spec.get("requestBody", {})
.get("content", {})
.get("application/json", {})
.get("schema")
)
if req_body:
schema["properties"]["requestBody"] = req_body
functions.append(
{"type": "function", "function": {"name": function_name, "description": desc, "parame
)
return functions
functions = openapi_to_functions(openapi_spec)
{'type': 'function',
'function': {'name': 'listEvents',
'description': 'List all events',
'parameters': {'type': 'object', 'properties': {}}}}
{'type': 'function',
'function': {'name': 'createEvent',
'description': 'Create a new event',
'parameters': {'type': 'object',
'properties': {'requestBody': {'type': 'object',
'properties': {'id': {'type': 'stri
'name': {'type': 'st
'date': {'type': 'st
'format': '
'location': {'type':
'required': ['name',
'date',
'location']}}}}}
{'type': 'function',
'function': {'name': 'getEventById',
'description': 'Retrieve an event by ID',
'parameters': {'type': 'object',
'properties': {'parameters': {'type': 'object',
'properties': {'id': {'type': 'strin
{'type': 'function',
'function': {'name': 'deleteEvent',
'description': 'Delete an event by ID'
It's important to note that the chat completions API does not execute the function; instead, it
generates the JSON that you can use to call the function in your own code.
SYSTEM_MESSAGE = """
You are a helpful assistant.
Respond to the following prompt by using function_call and then summarize actions.
Ask for clarification if a user request is ambiguous.
"""
# For the sake of this example, we'll simply add a message to simulate success.
# Normally, you'd want to call the function here, and append the results to messages.
messages.append(
{
"role": "tool",
"content": "success",
"tool_call_id": message.tool_calls[0].id,
}
)
num_calls += 1
except:
print("\n>> Message:\n")
print(message.content)
break
USER_INSTRUCTION = """
Instruction: Get all the events.
Then create a new event named AGI Party.
Then delete event with id 2456.
"""
process_user_instruction(functions, USER_INSTRUCTION)
[ChatCompletionMessageToolCall(id='call_jmlvEyMRMvOtB80adX9RbqIV', function=Function(arguments=
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompl
[ChatCompletionMessageToolCall(id='call_OOPOY7IHMq3T7Ib71JozlUQJ', function=Function(arguments=
ChatCompletionMessage(content=None, role='assistant', function_call=None, tool_calls=[ChatCompl
[ChatCompletionMessageToolCall(id='call_Kxluu3fJSOsZNNCn3JIlWAAM', function=Function(arguments=
ChatCompletionMessage(content='Here are the actions I performed:\n\n1. Retrieved all the events
None
>> Message:
Conclusion
We have demonstrated how to convert OpenAPI specs into function specifications that can be
given to GPT for it to intelligently call them, and shown how these can be chained together to
perform complex operations.
Possible extensions of this system could include handling more complex user instructions that
require conditional logic or looping, integrating with real APIs to perform actual operations, and
improving error handling and validation to ensure the instructions are feasible and the function
calls are successful.
Cookbook About API Docs Contribute
Note: To answer questions based on text documents, we recommend the procedure in Question
Answering using Embeddings. Some of the code below may rely on deprecated API endpoints.
This is expensive, and will also take a long time, as we call the davinci engine for each section.
You can simply download the final dataset instead.
Create a context by concatenating the title, the heading and the content of that section
import pandas as pd
df = pd.read_csv('olympics-data/olympics_sections.csv')
df['context'] = df.title + "\n" + df.heading + "\n\n" + df.content
df.head()
2020 Summer Summary The 2020 Summer Olympics 713 2020 Summer
0 Olympics (Japanese: 2020年夏季オリ Olympics\nSummary\n\nThe 2020
ン... Summ...
title heading content tokens context
2020 Summer Host city selection The International 126 2020 Summer Olympics\nHost city
1 Olympics Olympic Committee (IOC) selection\n\nT...
vote...
2020 Summer Impact of the COVID- In January 2020, 369 2020 Summer Olympics\nImpact of
2 Olympics 19 pandemic concerns were raised the COVID-19 p...
about th...
2020 Summer Qualifying event Concerns about the 298 2020 Summer
3 Olympics cancellation and pandemic began to affect Olympics\nQualifying event
postponement qu... cancell...
2020 Summer Effect on doping Mandatory doping tests 163 2020 Summer Olympics\nEffect on
4 Olympics tests were being severely doping tests\n...
res...
Note: We have used temperature=0, but it may be beneficial to experiment with a higher
temperature to get a higher diversity of questions.
WARNING: This step will last a long time, and consume a lot of tokens, as it calls davinci-instruct
for every section to generate a number of questions.
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
def get_questions(context):
try:
response = client.chat.completions.create(model="davinci-instruct-beta-v3",
prompt=f"Write questions based on the text below\n\nText: {context}\n\nQuestions:\n1.",
temperature=0,
max_tokens=257,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
stop=["\n\n"])
return response.choices[0].text
except:
return ""
df['questions']= df.context.apply(get_questions)
df['questions'] = "1." + df.questions
print(df[['questions']].values[0][0])
The prompt is designed to generate a number of questions. Example questions above were
generated based on the summary section of the 2020 Summer Olympics page.
We can observe that the questions 3 and 5 above repeat. Sometimes the generated questions
could be ambiguous without the context. We will show that even despite these limitations we
can create a successful model.
print(df.content.values[0])
The 2020 Summer Olympics (Japanese: 2020年夏季オリンピック, Hepburn: Nisen Nijū-nen Kaki Orinpikk
Tokyo was selected as the host city during the 125th IOC Session in Buenos Aires, Argentina, on
New events were introduced in existing sports for 2020, including 3x3 basketball, freestyle BMX
Use davinci-instruct to answer the questions given the relevant Wikipedia section contents
Note: We have used temperature=0, but it may be beneficial to experiment with a higher
temperature to get a higher diversity of questions.
WARNING: This step will last a long time, and consume a lot of tokens, as it calls davinci-
instruct for every section to answer all the questions.
def get_answers(row):
try:
response = client.chat.completions.create(
engine="davinci-instruct-beta-v3",
prompt=f"Write answer based on the text below\n\nText: {row.context}\n\nQuestions:\n{row
temperature=0,
max_tokens=257,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
return response.choices[0].text
except Exception as e:
print (e)
return ""
1. The 2020 Summer Olympics is an international multi-sport event held from 23 July to 8 August
2. The 2020 Summer Olympics took place from 23 July to 8 August 2021.
3. The United States topped the medal count by both total golds (39) and total medals (113), wi
4. The United States topped the medal count by both total golds (39) and total medals (113), wi
5. The United States topped the medal count by both total golds (39) and total medals (113), wi
These are the answers to the questions above based on the context around the host city
selection.
We can see that answers 3-5 contain the correct answer, but instead of answering the question
directly, the answer is a verbatim extraction. Despite these occasional lower quality answers, we
will show that the model can learn the task reasonably well, given a high number of examples.
df.to_csv('olympics-data/olympics_qa.csv', index=False)
We create a search file (API reference), which can be used to retrieve the relevant context when
a question is asked.
DEPRECATED: The /search endpoint is deprecated in favour of using embeddings. Embeddings
are cheaper, faster and can support a better search experience. See Question Answering Guide
for a search implementation using the embeddings
df = df[df.tokens<2000]
df[['context', 'tokens']].rename(columns={'context':'text','tokens':'metadata'}).to_json('olympics-da
search_file = client.files.create(
file=open("olympics-data/olympics_search.jsonl"),
purpose='search'
)
olympics_search_fileid = search_file['id']
We will use a simple implementation of the answers endpoint. This works by simply using the
/search endpoint, which searches over an indexed file to obtain the relevant sections which can
be included in the context, following by a question and answering prompt given a specified
model.
The women's 4 × 100 metres relay event at the 2020 Summer Olympics took place on 5 and 6 August
###
National Olympic Committees (NOCs) could qualify one relay team in one of three following ways:
The top 8 NOCs at the 2019 World Athletics Championships qualified a relay team.
The top 8 NOCs at the 2021 World Athletics Relays qualified a relay team.
Where an NOC placed in the top 8 at both the 2019 World Championships and the 2021 World Relays
The qualifying period was originally from 1 May 2019 to 29 June 2020. Due to the COVID-19 pande
answer_question(olympics_search_fileid, "davinci-instruct-beta-v3",
"Where did women's 4 x 100 metres relay event take place during the 2020 Summer Olympics?
After we fine-tune the model for Q&A we'll be able to use it instead of davinci-instruct-beta-
v3 , to obtain better answers when the question can't be answered based on the context. We
see a downside of davinci-instruct-beta-v3 , which always attempts to answer the question,
regardless of the relevant context being present or not. (Note the second question is asking
about a future event, set in 2024.)
answer_question(olympics_search_fileid, "davinci-instruct-beta-v3",
"Where did women's 4 x 100 metres relay event take place during the 2048 Summer Olympics?
We can see that davinci has a tendency to answer the question, even if the question can't be
answered given the context provided. Note the question asked regarding 2048 Summer
Olympics, which didn't happen yet, and the retrieved content has only returned results for 2020.
Parameters
----------
title: str
The title of the Wikipedia page
heading: str
The heading of the Wikipedia section
qusetion: str
The question
max_len: int
The maximum length of the context
search_model: str
The search model to use - `ada` is most cost effective
max_rerank: int
The maximum number of reranking documents to use the search model on
Returns
-------
rank: int
The rank of the correct context
token_length: int
The number of tokens needed to obtain the correct context
"""
try:
# TODO: openai.Engine(search_model) is deprecated
results = openai.Engine(search_model).search(
search_model=search_model,
query=question,
max_rerank=max_rerank,
file=olympics_search_fileid,
return_metadata=True
)
index=-1
returns = []
cur_len = 0
for result in results['data']:
cur_len += int(result['metadata']) + 4 # we add 4 tokens for the separator `\n\n###\n\n`
if cur_len > max_len:
break
returns.append(result['text'])
res = result['text'].split('\n')
if res[0] == title and res[1] == heading:
index = len(returns) - 1
break
return index, cur_len
except Exception as e:
#print (e)
return []
print(check_context("Athletics at the 2020 Summer Olympics – Women's 4 × 100 metres relay", "Summary"
(0, 58)
We utilize the generated questions based on context to estimate how often we can retrieve the
original context. These questions are noisy, so this is not a perfect estimate.
Our questions and answers are prefixed with numbered bullet points, however due to the way
they were generated, they are missing the first number, hence we add "1." to the list of
questions (and answers).
We calculate the rank of the section retrieved using ada search, and the number of tokens in the
context needed to retrieve the relevant section in full.
ada_results = df.apply(lambda x: [
check_context( x.title,
x.heading,
q[3:], # remove the number prefix
max_len=1000000, # set a large number to get the full context
search_model='ada',
max_rerank=200,
)
for q in (x.questions).split('\n') # split the questions
if len(q) >10 # remove the empty questions
], axis=1)
ada_results.head()
def expand_lists(out):
"""
Expand a pandas series containing lists into a series, where each list element becomes a value on
out_expanded = expand_lists(out)
out_expanded['rank'] = out_expanded.ada.apply(lambda x: x[0] if x != [] else -2)
out_expanded['tokens'] = out_expanded.ada.apply(lambda x: x[1] if x != [] else -2)
7.4% of relevant paragraphs are not retrieved within the first 200 results
7.4% of the time, this is due to the keyword search part of the search algorithm not retrieving
the relevant context within the first 200 results. 18.3% of the time this is due to the semantic
search not placing the relevant context within the first 2000 tokens.
# normalized value_counts
out_expanded['rank'].value_counts(normalize=True).sort_index()[:13]
-2 0.000063
-1 0.074428
0 0.453420
1 0.089515
2 0.047146
3 0.032437
4 0.024139
5 0.019676
6 0.015967
7 0.013452
8 0.011189
9 0.009869
10 0.009178
Name: rank, dtype: float64
probabilities of the relevant context being returned at each rank. (-2 means a processing error,
-1 means the rank is >200)
Cookbook About API Docs Contribute
OpenAI functions enable your app to take action based on user inputs. This means that it can,
e.g., search the web, send emails, or book tickets on behalf of your users, making it more
powerful than a regular chatbot.
In this tutorial, you will build an app that uses OpenAI functions along with the latest version of
the Node.js SDK. The app runs in the browser, so you only need a code editor and, e.g., VS Code
Live Server to follow along locally. Alternatively, write your code directly in the browser via this
code playground at Scrimba.
Our app is a simple agent that helps you find activities in your area. It has access to two
functions, getLocation() and getCurrentWeather() , which means it can figure out where
you’re located and what the weather is at the moment.
At this point, it's important to understand that OpenAI doesn't execute any code for you. It just
tells your app which functions it should use in a given scenario, and then leaves it up to your
app to invoke them.
Once our agent knows your location and the weather, it'll use GPT’s internal knowledge to
suggest suitable local activities for you.
Since we're running our code in a browser environment at Scrimba, we also need to set
dangerouslyAllowBrowser: true to confirm we understand the risks involved with client-side
API requests. Please note that you should move these requests over to a Node server in a
production app.
Next, we'll create the two functions. The first one - getLocation - uses the IP API to get the
location of the user.
The IP API returns a bunch of data about your location, including your latitude and longitude,
which we’ll use as arguments in the second function getCurrentWeather . It uses the Open
Meteo API to get the current weather data, like this:
For OpenAI to understand the purpose of these functions, we need to describe them using a
specific schema. We'll create an array called tools that contains one object per function. Each
object will have two keys: type , function , and the function key has three subkeys: name ,
description , and parameters .
const tools = [
{
type: "function",
function: {
name: "getCurrentWeather",
description: "Get the current weather in a given location",
parameters: {
type: "object",
properties: {
latitude: {
type: "string",
},
longitude: {
type: "string",
},
},
required: ["longitude", "latitude"],
},
}
},
{
type: "function",
function: {
name: "getLocation",
description: "Get the user's location based on their IP address",
parameters: {
type: "object",
properties: {},
},
}
},
];
The first object in the array should always have the role property set to "system" , which tells
OpenAI that this is how we want it to behave.
const messages = [
{
role: "system",
content:
"You are a helpful assistant. Only use the functions you have been provided with.",
},
];
We are now ready to build the logic of our app, which lives in the agent function. It is
asynchronous and takes one argument: the userInput .
We start by pushing the userInput to the messages array. This time, we set the role to
"user" , so that OpenAI knows that this is the input from the user.
Next, we'll send a request to the Chat completions endpoint via the
chat.completions.create() method in the Node SDK. This method takes a configuration
messages - The entire history of messages between the user and the AI up until this point.
tools - A list of tools the model may call. Currently, only functions are supported as a
Let's try to run the agent with an input that requires a function call to give a suitable reply.
When we run the code above, we see the response from OpenAI logged out to the console like
this:
{
id: "chatcmpl-84ojoEJtyGnR6jRHK2Dl4zTtwsa7O",
object: "chat.completion",
created: 1696159040,
model: "gpt-4-0613",
choices: [{
index: 0,
message: {
role: "assistant",
content: null,
tool_calls: [
id: "call_CBwbo9qoXUn1kTR5pPuv6vR1",
type: "function",
function: {
name: "getLocation",
arguments: "{}"
}
]
},
logprobs: null,
finish_reason: "tool_calls" // OpenAI wants us to call a function
}],
usage: {
prompt_tokens: 134,
completion_tokens: 6,
total_tokens: 140
}
system_fingerprint: null
}
This response tells us that we should call one of our functions, as it contains the following key:
finish_reason: "tool_calls" .
"getLocation" .
Now that we have the name of the function as a string, we'll need to translate that into a
function call. To help us with that, we'll gather both of our functions in an object called
availableTools :
const availableTools = {
getCurrentWeather,
getLocation,
};
This is handy because we'll be able to access the getLocation function via bracket notation
and the string we got back from OpenAI, like this: availableTools["getLocation"] .
If we run the code again with the same input ( "Where am I located right now?" ), we'll see that
functionResponse is an object filled with location about where the user is located right now. In
{ip: "193.212.60.170", network: "193.212.60.0/23", version: "IPv4", city: "Oslo", region: "Oslo Cou
We'll add this data to a new item in the messages array, where we also specify the name of the
function we called.
messages.push({
role: "function",
name: functionName,
content: `The result of the last function was this: ${JSON.stringify(
functionResponse
)}
`,
});
Notice that the role is set to "function" . This tells OpenAI that the content parameter
contains the result of the function call and not the input from the user.
At this point, we need to send a new request to OpenAI with this updated messages array.
However, we don’t want to hard code a new function call, as our agent might need to go back
and forth between itself and GPT several times until it has found the final answer for the user.
This can be solved in several different ways, e.g. recursion, a while-loop, or a for-loop. We'll use
a good old for-loop for the sake of simplicity.
If we get back finish_reason: "tool_calls" from GPT, we'll just push the result of the
function call to the messages array and jump to the next iteration of the loop, triggering a new
request.
If we get finish_reason: "stop" back, then GPT has found a suitable answer, so we'll return
the function and cancel the loop.
messages.push({
role: "function",
name: functionName,
content: `
The result of the last function was this: ${JSON.stringify(
functionResponse
)}
`,
});
} else if (finish_reason === "stop") {
messages.push(message);
return message.content;
}
}
return "The maximum number of iterations has been met without a suitable answer. Please try again w
If we don't see a finish_reason: "stop" within our five iterations, we'll return a message
saying we couldn’t find a suitable answer.
At this point, we are ready to try our app! I'll ask the agent to suggest some activities based on
my location and the current weather.
Based on your current location in Oslo, Norway and the weather (15°C and snowy),
here are some activity suggestions:
If we peak under the hood, and log out response.choices[0].message in each iteration of the
loop, we'll see that GPT has instructed us to use both our functions before coming up with an
answer.
First, it tells us to call the getLocation function. Then it tells us to call the getCurrentWeather
function with "longitude": "10.859", "latitude": "59.955" passed in as the arguments. This
is data it got back from the first function call we did.
{"role":"assistant","content":null,"tool_calls":[{"id":"call_Cn1KH8mtHQ2AMbyNwNJTweEP","type":"func
{"role":"assistant","content":null,"tool_calls":[{"id":"call_uc1oozJfGTvYEfIzzcsfXfOl","type":"func
You've now built an AI agent using OpenAI functions and the Node.js SDK! If you're looking for
an extra challenge, consider enhancing this app. For example, you could add a function that
fetches up-to-date information on events and activities in the user's location.
Happy coding!
Complete code
Cookbook About API Docs Contribute
You want to use Weaviate with the Generative OpenAI module (generative-openai).
Prerequisites
This cookbook only coveres Generative Search examples, however, it doesn't cover the
configuration and data imports.
In order to make the most of this cookbook, please complete the Getting Started cookbook
first, where you will learn the essentials of working with Weaviate and import the demo data.
Checklist:
===========================================================
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY .
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
import os
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = 'your-key-goes-here'
1. test env variable OPENAI_API_KEY – make sure you completed the step in #Prepare-your-
OpenAI-API-key
The client
After this step, the client object will be used to perform all Weaviate-related operations.
import weaviate
from datasets import load_dataset
import os
Generative Search
Weaviate offers a Generative Search OpenAI module, which generates responses based on the
data stored in your Weaviate instance.
The way you construct a generative search query is very similar to a standard semantic search
query in Weaviate.
For example:
search in "Articles",
result = (
client.query
.get("Articles", ["title", "content", "url"])
.with_near_text("concepts": "football clubs")
.with_limit(5)
# generative query will go here
.do()
)
result = (
client.query
.get(collection_name, ["title", "content", "url"])
.with_near_text({ "concepts": [query], "distance": 0.7 })
.with_limit(5)
.with_generate(single_prompt=prompt)
.do()
)
return result["data"]["Get"][collection_name]
result = (
client.query
.get(collection_name, ["title", "content", "url"])
.with_near_text({ "concepts": [query], "distance": 0.7 })
.with_generate(grouped_task=generateTask)
.with_limit(5)
.do()
)
return result["data"]["Get"][collection_name]
query_result = generative_search_group("football clubs", "Article")
print (query_result[0]['_additional']['generate']['groupedResult'])
Thanks for following along, you're now equipped to set up your own vector databases and use
embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue
to work through other cookbook examples in this repo.
Cookbook About API Docs Contribute
In this notebook we will look into building an RAG pipeline and evaluating it with LlamaIndex. It
has following 3 sections.
LLMs are trained on vast datasets, but these will not include your specific data. Retrieval-
Augmented Generation (RAG) addresses this by dynamically incorporating your data during the
generation process. This is done not by altering the training data of LLMs, but by allowing the
model to access and utilize your data in real-time to provide more tailored and contextually
relevant responses.
In RAG, your data is loaded and and prepared for queries or “indexed”. User queries act on the
index, which filters your data down to the most relevant context. This context and your query
then go to the LLM along with a prompt, and the LLM provides a response.
Even if what you’re building is a chatbot or an agent, you’ll want to know RAG techniques for
getting data into your application.
Stages within RAG
There are five key stages within RAG, which in turn will be a part of any larger application you
build. These are:
Loading: this refers to getting your data from where it lives – whether it’s text files, PDFs,
another website, a database, or an API – into your pipeline. LlamaHub provides hundreds of
connectors to choose from.
Indexing: this means creating a data structure that allows for querying the data. For LLMs this
nearly always means creating vector embeddings, numerical representations of the meaning of
your data, as well as numerous other metadata strategies to make it easy to accurately find
contextually relevant data.
Storing: Once your data is indexed, you will want to store your index, along with any other
metadata, to avoid the need to re-index it.
Querying: for any given indexing strategy there are many ways you can utilize LLMs and
LlamaIndex data structures to query, including sub-queries, multi-step queries and hybrid
strategies.
Evaluation: a critical step in any pipeline is checking how effective it is relative to other
strategies, or when you make changes. Evaluation provides objective measures of how accurate,
faithful and fast your responses to queries are.
Now that we have understood the significance of RAG system, let's build a simple RAG pipeline.
# The nest_asyncio module enables the nesting of asynchronous functions within an already running asy
# This is necessary because Jupyter notebooks inherently operate in an asynchronous loop.
# By applying nest_asyncio, we can run additional async functions within this existing loop without c
import nest_asyncio
nest_asyncio.apply()
import os
import pandas as pd
Let's use Paul Graham Essay text for building RAG pipeline.
Download Data
!mkdir -p 'data/paul_graham/'
!curl 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/pa
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
# Define an LLM
llm = OpenAI(model="gpt-4")
query_engine = vector_index.as_query_engine()
Check response.
response_vector.response
'The author wrote short stories and worked on programming, specifically on an IBM 1401 computer
By default it retrieves two similar nodes/ chunks. You can modify that in
vector_index.as_query_engine(similarity_top_k=k) .
"It felt like I was doing life right. I remember that because I was slightly dismayed at how no
We have built a RAG pipeline and now need to evaluate its performance. We can assess our RAG
system/query engine using LlamaIndex's core evaluation modules. Let's examine how to
leverage these tools to quantify the quality of our retrieval-augmented generation system.
Evaluation
Evaluation should serve as the primary metric for assessing your RAG application. It determines
whether the pipeline will produce accurate responses based on the data sources and a range of
queries.
While it's beneficial to examine individual queries and responses at the start, this approach may
become impractical as the volume of edge cases and failures increases. Instead, it may be more
effective to establish a suite of summary metrics or automated evaluations. These tools can
provide insights into overall system performance and indicate specific areas that may require
closer scrutiny.
Retrieval Evaluation: This assesses the accuracy and relevance of the information retrieved
by the system.
Response Evaluation: This measures the quality and appropriateness of the responses
generated by the system based on the retrieved information.
which can be used in the assessment of the RAG system of both Retrieval and Response
Evaluation. For more details on Question Generation, please refer to the documentation.
qa_dataset = generate_question_context_pairs(
nodes,
llm=llm,
num_questions_per_chunk=2
)
Retrieval Evaluation:
We are now prepared to conduct our retrieval evaluations. We will execute our
RetrieverEvaluator using the evaluation dataset we have generated.
We first create the Retriever and then define two functions: get_eval_results , which
operates our retriever on the dataset, and display_results , which presents the outcomes of
the evaluation.
retriever = vector_index.as_retriever(similarity_top_k=2)
Define RetrieverEvaluator . We use Hit Rate and MRR metrics to evaluate our Retriever.
Hit Rate:
Hit rate calculates the fraction of queries where the correct answer is found within the top-k
retrieved documents. In simpler terms, it’s about how often our system gets it right within the
top few guesses.
retriever_evaluator = RetrieverEvaluator.from_metric_names(
["mrr", "hit_rate"], retriever=retriever
)
# Evaluate
eval_results = await retriever_evaluator.aevaluate_dataset(qa_dataset)
Let's define a function to display the Retrieval evaluation results in table format.
metric_dicts = []
for eval_result in eval_results:
metric_dict = eval_result.metric_vals_dict
metric_dicts.append(metric_dict)
full_df = pd.DataFrame(metric_dicts)
hit_rate = full_df["hit_rate"].mean()
mrr = full_df["mrr"].mean()
metric_df = pd.DataFrame(
{"Retriever Name": [name], "Hit Rate": [hit_rate], "MRR": [mrr]}
)
return metric_df
Observation:
The Retriever with OpenAI Embedding demonstrates a performance with a hit rate of 0.7586 ,
while the MRR, at 0.6206 , suggests there's room for improvement in ensuring the most
relevant results appear at the top. The observation that MRR is less than the hit rate indicates
that the top-ranking results aren't always the most relevant. Enhancing MRR could involve the
use of rerankers, which refine the order of retrieved documents. For a deeper understanding of
how rerankers can optimize retrieval metrics, refer to the detailed discussion in our blog post.
Response Evaluation:
1. FaithfulnessEvaluator: Measures if the response from a query engine matches any source
nodes which is useful for measuring if the response is hallucinated.
2. Relevancy Evaluator: Measures if the response + source nodes match the query.
queries = list(qa_dataset.queries.values())
Faithfulness Evaluator
We will use gpt-3.5-turbo for generating response for a given query and gpt-4 for
evaluation.
# gpt-3.5-turbo
gpt35 = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context_gpt35 = ServiceContext.from_defaults(llm=gpt35)
# gpt-4
gpt4 = OpenAI(temperature=0, model="gpt-4")
service_context_gpt4 = ServiceContext.from_defaults(llm=gpt4)
Create a QueryEngine with gpt-3.5-turbo service_context to generate response for the query.
Create a FaithfulnessEvaluator.
eval_query = queries[10]
eval_query
"Based on the author's experience and observations, why did he consider the AI practices during
response_vector = query_engine.query(eval_query)
eval_result = faithfulness_gpt4.evaluate_response(response=response_vector)
Relevancy Evaluator
RelevancyEvaluator is useful to measure if the response and source nodes (retrieved context)
match the query. Useful to see if response actually answers the query.
relevancy_gpt4 = RelevancyEvaluator(service_context=service_context_gpt4)
# Pick a query
query = queries[10]
query
"Based on the author's experience and observations, why did he consider the AI practices during
# Generate response.
# response_vector has response and source nodes (retrieved context)
response_vector = query_engine.query(query)
# Relevancy evaluation
eval_result = relevancy_gpt4.evaluate_response(
query=query, response=response_vector
)
True
# You can get the feedback for the evaluation.
eval_result.feedback
'YES'
Batch Evaluator:
Now that we have done FaithFulness and Relevancy Evaluation independently. LlamaIndex has
BatchEvalRunner to compute multiple evaluations in batch wise manner.
# Compute evaluation
eval_results = await runner.aevaluate_queries(
query_engine, queries=batch_eval_queries
)
faithfulness_score
1.0
relevancy_score
1.0
Observation:
Faithfulness score of 1.0 signifies that the generated answers contain no hallucinations and are
entirely based on retrieved context.
Relevancy score of 1.0 suggests that the answers generated are consistently aligned with the
retrieved context and the queries.
Conclusion
In this notebook, we have explored how to build and evaluate a RAG pipeline using LlamaIndex,
with a specific focus on evaluating the retrieval system and generated responses within the
pipeline.
LlamaIndex offers a variety of other evaluation modules as well, which you can explore further
here
Cookbook About API Docs Contribute
You want to use Weaviate with the OpenAI module (text2vec-openai), to generate vector
embeddings for you.
This notebook takes you through a simple flow to set up a Weaviate instance, connect to it (with
OpenAI API key), configure data schema, import data (which will automatically generate vector
embeddings for your data), and run semantic search.
This is a common requirement for customers who want to store and search our embeddings
with their own data in a secure environment to support production use cases such as chatbots,
topic modelling and more.
What is Weaviate
Weaviate is an open-source vector search engine that stores data objects together with their
vectors. This allows for combining vector search with structured filtering.
Weaviate uses KNN algorithms to create an vector-optimized index, which allows your queries
to run extremely fast. Learn more here.
Weaviate let you use your favorite ML-models, and scale seamlessly into billions of data objects.
Deployment options
Whatever your scenario or production setup, Weaviate has an option for you. You can deploy
Weaviate in the following setups:
Self-hosted – you can deploy Weaviate with docker locally, or any server you want.
SaaS – you can use Weaviate Cloud Service (WCS) to host your Weaviate instances.
Hybrid-SaaS – you can deploy Weaviate in your own private Cloud Service.
Programming languages
Weaviate offers four client libraries, which allow you to communicate from your apps:
Python
JavaScript
Java
Go
Additionally, Weaviate has a REST layer. Basically you can call Weaviate from any language that
supports REST requests.
Demo Flow
Prerequisites Setup: Create a Weaviate instance and install the required libraries
Note: The import process will automatically index your data - based on the
configuration in the schema
Note: You don't need to explicitly vectorize your data, Weaviate will communicate with
OpenAI to do it for you
Note: You don't need to explicitly vectorize your queries, Weaviate will communicate
with OpenAI to do it for you
Once you've run through this notebook you should have a basic understanding of how to setup
and use vector databases, and can move on to more complex use cases making use of our
embeddings.
This module is responsible for handling vectorization during import (or any CRUD operations)
and when you run a query.
This is great news for you. With text2vec-openai you don't need to manually vectorize your
data, as Weaviate will call OpenAI for you whenever necessary.
1. provide your OpenAI API Key – when you connected to the Weaviate Client
Prerequisites
install libraries
weaviate-client
datasets
apache-beam
===========================================================
1. (Recommended path) Weaviate Cloud Service – to host your Weaviate instance in the
cloud. The free sandbox should be more than enough for this cookbook.
4. make a note of the Cluster Id . The link will take you to the full path of your cluster (you
will need it later to connect to it). It should be something like: https://your-project-
name.weaviate.network
2. Then open your terminal, navigate to where your docker-compose.yml file is located, and
start docker with: docker-compose up -d
3. Once this is ready, your instance should be available at http://localhost:8080
Note. To shut down your docker instance you can call: docker-compose down
Learn more
To learn more, about using Weaviate with Docker see the installation documentation.
===========================================================
Before running this project make sure to have the following libraries:
===========================================================
The OpenAI API key is used for vectorization of your data at import, and for running queries.
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY .
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
import os
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = 'your-key-goes-here'
1. test env variable OPENAI_API_KEY – make sure you completed the step in #Prepare-your-
OpenAI-API-key
The client
After this step, the client object will be used to perform all Weaviate-related operations.
import weaviate
from datasets import load_dataset
import os
Schema
In this section, we will:
“This is the second and final step, which requires OpenAI specific configuration. After this
step, the rest of instructions wlll only touch on Weaviate, as the OpenAI tasks will be
handled automatically.”
What is a schema
In Weaviate you create schemas to capture each of the entities you will be searching.
title
content
url
We want to vectorize title and content , but not the url .
# Define the Schema object to use `text-embedding-3-small` on `title` and `content`, but skip it for
article_schema = {
"class": "Article",
"description": "A collection of articles",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text"
}
},
"properties": [{
"name": "title",
"description": "Title of the article",
"dataType": ["string"]
},
{
"name": "content",
"description": "Contents of the article",
"dataType": ["text"]
},
{
"name": "url",
"description": "URL to the article",
"dataType": ["string"],
"moduleConfig": { "text2vec-openai": { "skip": True } }
}]
}
Import data
2. configure Weaviate Batch import (to make the import more efficient)
3. import the data into Weaviate
“Note: Like mentioned before. We don't need to manually vectorize the data. The text2vec-
openai module will take care of that.”
# We'll use the datasets library to pull the Simple Wikipedia dataset for embedding
dataset = list(load_dataset("wikipedia", "20220301.simple")["train"])
client.batch.configure(
batch_size=10,
dynamic=True,
timeout_retries=3,
# callback=None,
)
print("Importing Articles")
counter=0
properties = {
"title": article["title"],
"content": article["text"],
"url": article["url"]
}
batch.add_data_object(properties, "Article")
counter = counter+1
print(test_article['title'])
print(test_article['url'])
print(test_article['content'])
Search Data
As above, we'll fire some queries at our new Index and get back results based on the closeness
to our existing vectors
nearText = {
"concepts": [query],
"distance": 0.7,
}
properties = [
"title", "content", "url",
"_additional {certainty distance}"
]
result = (
client.query
.get(collection_name, properties)
.with_near_text(nearText)
.with_limit(10)
.do()
)
# Check for errors
if ("errors" in result):
print ("\033[91mYou probably have run out of OpenAI API calls for the current minute – the li
raise Exception(result["errors"][0]['message'])
return result["data"]["Get"][collection_name]
Thanks for following along, you're now equipped to set up your own vector databases and use
embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue
to work through other cookbook examples in this repo.
Cookbook About API Docs Contribute
This notebook shows how to use LLMs in combination with Neo4j, a graph database, to
perform Retrieval Augmented Generation (RAG).
Reduce hallucinations
If you have data where relationships between data points are important and you might want to
leverage that, then it might be worth considering graph databases instead of traditional
relational databases.
Graph databases are particularly relevant for recommendation systems, network relationships or
analysing correlation between data points.
Recommendation chatbot
AI-augmented CRM
Depending on your use case, you can assess whether using a graph database makes sense.
In this notebook, we will build a product recommendation chatbot, with a graph database that
contains Amazon products data.
Setup
Make sure you have your OpenAI account set up and you have your OpenAI API key handy.
import os
import json
import pandas as pd
Dataset
We will use a dataset that was created from a relational database and converted to a json
format, creating relationships between entities with the completions API.
We will then load this data into the graph db to be able to query it.
Loading dataset
df = pd.read_json(file_path)
df.head()
Connecting to db
# DB credentials
url = "bolt://localhost:7687"
username ="neo4j"
password = "<your_password_here>"
graph = Neo4jGraph(
url=url,
username=username,
password=password
)
Importing data
def sanitize(text):
text = str(text).replace("'","").replace('"','').replace('{','').replace('}', '')
return text
MERGE (product)-[:{obj['relationship']}]->(entity)
'''
graph.query(query)
We will be using the OpenAIEmbeddings Langchain utility. It's important to note that Langchain
adds a pre-processing step, so the embeddings will slightly differ from those generated directly
with the OpenAI embeddings API.
vector_index = Neo4jVector.from_existing_graph(
OpenAIEmbeddings(model=embeddings_model),
url=url,
username=username,
password=password,
index_name='products',
node_label="Product",
text_node_properties=['name', 'title'],
embedding_node_property='embedding',
)
def embed_entities(entity_type):
vector_index = Neo4jVector.from_existing_graph(
OpenAIEmbeddings(model=embeddings_model),
url=url,
username=username,
password=password,
index_name=entity_type,
node_label=entity_type,
text_node_properties=['value'],
embedding_node_property='embedding',
)
entities_list = df['entity_type'].unique()
for t in entities_list:
embed_entities(t)
Using GraphCypherQAChain , we can generate queries against the database using Natural
Language.
from langchain.chains import GraphCypherQAChain
from langchain.chat_models import ChatOpenAI
chain = GraphCypherQAChain.from_llm(
ChatOpenAI(temperature=0), graph=graph, verbose=True,
)
chain.run("""
Help me find curtains
""")
"I'm sorry, but I don't have any information to help you find curtains."
However, there is little added value here compared to just writing the Cypher queries ourselves,
and it is prone to error.
Indeed, asking an LLM to generate a Cypher query directly might result in the wrong
parameters being used, whether it's the entity type or the relationship type, as is the case
above.
We will instead use LLMs to decide what to search for, and then generate the corresponding
Cypher queries using templates.
For this purpose, we will instruct our model to find relevant entities in the user prompt that can
be used to query our database.
entity_types = {
"product": "Item detailed type, for example 'high waist pants', 'outdoor plant pot', 'chef kitche
"category": "Item category, for example 'home decoration', 'women clothing', 'office supply'",
"characteristic": "if present, item characteristics, for example 'waterproof', 'adhesive', 'easy
"measurement": "if present, dimensions of the item",
"brand": "if present, brand of the item",
"color": "if present, color of the item",
"age_group": "target age group for the product, one of 'babies', 'children', 'teenagers', 'adults
}
relation_types = {
"hasCategory": "item is of this category",
"hasCharacteristic": "item has this characteristic",
"hasMeasurement": "item is of this measurement",
"hasBrand": "item is of this brand",
"hasColor": "item is of this color",
"isFor": "item is for this age_group"
}
entity_relationship_match = {
"category": "hasCategory",
"characteristic": "hasCharacteristic",
"measurement": "hasMeasurement",
"brand": "hasBrand",
"color": "hasColor",
"age_group": "isFor"
}
system_prompt = f'''
You are a helpful agent designed to fetch information from a graph database.
Depending on the user prompt, determine if it possible to answer with the graph database.
The graph database can match products with multiple relationships to several entities.
If there are no relevant entities in the user prompt, return an empty json object.
'''
print(system_prompt)
example_queries = [
"Which pink items are suitable for children?",
"Help me find gardening gear that is waterproof",
"I'm looking for a bench with dimensions 100x50 for my living room"
]
for q in example_queries:
print(f"Q: '{q}'\n{define_query(q)}\n")
Generating queries
Now that we know what to look for, we can generate the corresponding Cypher queries to
query our database.
However, the entities extracted might not be an exact match with the data we have, so we will
use the GDS cosine similarity function to return products that have relationships with entities
similar to what the user is asking.
def create_embedding(text):
result = client.embeddings.create(model=embeddings_model, input=text)
return result.data[0].embedding
# The threshold defines how closely related words should be. Adjust the threshold to return more or l
def create_query(text, threshold=0.81):
query_data = json.loads(text)
# Creating embeddings
embeddings_data = []
for key, val in query_data.items():
if key != 'product':
embeddings_data.append(f"${key}Embedding AS {key}Embedding")
query = "WITH " + ",\n".join(e for e in embeddings_data)
# Matching products to each entity
query += "\nMATCH (p:Product)\nMATCH "
match_data = []
for key, val in query_data.items():
if key != 'product':
relationship = entity_relationship_match[key]
match_data.append(f"(p)-[:{relationship}]->({key}Var:{key})")
query += ",\n".join(e for e in match_data)
similarity_data = []
for key, val in query_data.items():
if key != 'product':
similarity_data.append(f"gds.similarity.cosine({key}Var.embedding, ${key}Embedding) > {th
query += "\nWHERE "
query += " AND ".join(e for e in similarity_data)
query += "\nRETURN p"
return query
def query_graph(response):
embeddingsParams = {}
query = create_query(response)
query_data = json.loads(response)
for key, val in query_data.items():
embeddingsParams[f"{key}Embedding"] = create_embedding(val)
result = graph.query(query, params=embeddingsParams)
return result
example_response = '''{
"category": "clothes",
"color": "blue",
"age_group": "adults"
}'''
result = query_graph(example_response)
# Result
print(f"Found {len(result)} matching product(s):\n")
for r in result:
print(f"{r['p']['name']} ({r['p']['id']})")
We can then leverage the graph db to find similar products based on common characteristics.
This criteria is arbitrary and completely depends on what is the most relevant in relation to your
use case.
# Adjust the relationships_threshold to return products that have more or less relationships in commo
def query_similar_items(product_id, relationships_threshold = 3):
similar_items = []
# Fetching items in the same category with at least 1 other entity in common
query_category = '''
MATCH (p:Product {id: $product_id})-[:hasCategory]->(c:category)
MATCH (p)-->(entity)
WHERE NOT entity:category
MATCH (n:Product)-[:hasCategory]->(c)
MATCH (n)-->(commonEntity)
WHERE commonEntity = entity AND p.id <> n.id
RETURN DISTINCT n;
'''
for i in result_category:
similar_items.append({
"id": i['n']['id'],
"name": i['n']['name']
})
for i in result_common_entities:
result_id = i['n']['id']
if not any(item['id'] == result_id for item in similar_items):
similar_items.append({
"id": result_id,
"name": i['n']['name']
})
return similar_items
product_ids = ['1519827', '2763742']
Final result
Now that we have all the pieces working, we will stitch everything together.
We can also add a fallback option to do a product name/title similarity search if we can't find
relevant entities in the user prompt.
We will explore 2 options, one with a Langchain agent for a conversational experience, and one
that is more deterministic based on code only.
Depending on your use case, you might choose one or the other option and tailor it to your
needs.
def query_db(params):
matches = []
# Querying the db
result = query_graph(params)
for r in result:
product_id = r['p']['id']
matches.append({
"id": product_id,
"name":r['p']['name']
})
return matches
[{'id': 1925202, 'name': 'Blackout Curtain'}, {'id': 1706369, 'name': '100% Blackout Curtains'}
tools = [
Tool(
name="Query",
func=query_db,
description="Use this tool to find entities in the user prompt that can be used to generate q
),
Tool(
name="Similarity Search",
func=similarity_search,
description="Use this tool to perform a similarity search with the products in the database"
)
]
prompt_template = '''Your goal is to find a product in the database that best matches the user prompt
You have access to these tools:
{tools}
Rules to follow:
1. Start by using the Query tool with the prompt as parameter. If you found results, stop here.
2. If the result is an empty array, use the similarity search tool with the full initial user prompt
3. If you cannot still cannot find the answer with this, probe the user to provide more context on th
Keep in mind that we can use entities of the following types to search for products:
{entity_types}.
Never return results if you did not find any results in the array returned by the query tool or the s
If you didn't find any result, reply: "Sorry, I didn't find any suitable products."
If you found results from the database, this is your final answer, reply to the user by announcing th
name_of_the_product (id_of_the_product)"
Only use exact names and ids of the products returned as results when providing your final answer.
User prompt:
{input}
{agent_scratchpad}
'''
prompt = CustomPromptTemplate(
template=prompt_template,
tools=tools,
input_variables=["input", "intermediate_steps"],
)
class CustomOutputParser(AgentOutputParser):
def parse(self, llm_output: str) -> Union[AgentAction, AgentFinish]:
output_parser = CustomOutputParser()
agent = LLMSingleActionAgent(
llm_chain=llm_chain,
output_parser=output_parser,
stop=["\Observation:"],
allowed_tools=tool_names
)
def agent_interaction(user_prompt):
agent_executor.run(user_prompt)
prompt1 = "I'm searching for pink shirts"
agent_interaction(prompt1)
prompt2 = "Can you help me find a toys for my niece, she's 8"
agent_interaction(prompt2)
Indeed, the agent seems to retrieve results from the tools, but comes up with made-up
responses.
For this specific use case, if the conversational aspect is less relevant, we can actually create a
function that will call our previously-defined tasks and provide an answer.
import logging
prompt1 = "I'm looking for food items to gift to someone for Christmas. Ideally chocolate."
answer(prompt1)
prompt2 = "Help me find women clothes for my wife. She likes blue."
answer(prompt2)
prompt4 = "Can you help me find a gift for my niece? She's 8 and she likes pink."
answer(prompt4)
Prompt: "I'm looking for food items to gift to someone for Christmas. Ideally chocolate."
{
"category": "food",
"characteristic": "chocolate"
}
Found 0 matches with Query function.
Prompt: "Help me find women clothes for my wife. She likes blue."
{
"color": "blue",
"category": "women clothing"
}
Found 15 matches with Query function.
Conclusion
User experience
When the primary objective is to extract specific information from our database, Large
Language Models (LLMs) can significantly enhance our querying capabilities.
However, it's crucial to base much of this process on robust code logic to ensure a foolproof
user experience.
Ultimately, the design choice depends on the desired user experience. For instance, if the aim is
to create a visual recommendation system, the importance of a conversational interface is less
relevant.
The querying part of this notebook would work on a relational database as well, the knowledge
graph comes in handy when we want to couple the results with similar items that the graph is
surfacing.
Considering the added complexity, make sure using a knowledge graph is the best option for
your use case. If it is the case, feel free to refine what this cookbook presents to match your
needs and perform even better!
Cookbook About API Docs Contribute
This notebook takes you through a simple flow to download some data, embed it, and then
index and search it using a selection of vector databases. This is a common requirement for
customers who want to store and search our embeddings with their own data in a secure
environment to support production use cases such as chatbots, topic modelling and more.
Vector databases enable enterprises to take many of the embeddings use cases we've shared in
this repo (question and answering, chatbot and recommendation services, for example), and
make use of them in a secure, scalable environment. Many of our customers make embeddings
solve their problems at small scale but performance and security hold them back from going
into production - we see vector databases as a key component in solving that, and in this guide
we'll walk through the basics of embedding text data, storing it in a vector database and using it
for semantic search.
Demo Flow
The demo flow is:
Setup: Import packages and set any required variables
Weaviate
Setup: Here we'll set up the Python client for Weaviate. For more details go here
Once you've run through this notebook you should have a basic understanding of how to setup
and use vector databases, and can move on to more complex use cases making use of our
embeddings.
Setup
Import the required libraries and set the embedding model that we'd like to use.
import openai
# I've set this to our new embeddings model, this can be changed to the embedding model of your choic
EMBEDDING_MODEL = "text-embedding-3-small"
# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings
In this section we'll load embedded data that we've prepared previous to this session.
embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
import zipfile
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip","r") as zip_ref:
zip_ref.extractall("../data")
article_df = pd.read_csv('../data/vector_database_wikipedia_articles_embedded.csv')
article_df.head()
article_df.info(show_counts=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 25000 non-null int64
1 url 25000 non-null object
2 title 25000 non-null object
3 text 25000 non-null object
4 title_vector 25000 non-null object
5 content_vector 25000 non-null object
6 vector_id 25000 non-null object
dtypes: int64(1), object(6)
memory usage: 1.3+ MB
Weaviate
Another vector database option we'll explore is Weaviate, which offers both a managed, SaaS
option, as well as a self-hosted open source option. As we've already looked at a cloud vector
database, we'll try the self-hosted option here.
For scenarios, where your data is not vectorized yet, you can delegate the vectorization task
with OpenAI to Weaviate. Weaviate offers a built-in module text2vec-openai, which takes care
of the vectorization for you at:
import
Check out the Getting Started with Weaviate and OpenAI module cookbook to learn step by
step how to import and vectorize data in one step.
Setup
To run Weaviate locally, you'll need Docker. Following the instructions contained in the
Weaviate documentation here, we created an example docker-compose.yml file in this repo
saved at ./weaviate/docker-compose.yml.
After starting Docker, you can start Weaviate locally by navigating to the
examples/vector_databases/weaviate/ directory and running docker-compose up -d .
SaaS
Alternatively you can use Weaviate Cloud Service (WCS) to create a free Weaviate cluster.
client.is_ready()
Index data
In Weaviate you create schemas to capture each of the entities you will be searching.
In this case we'll create a schema called Article with the title vector from above included for us
to search by.
The next few steps closely follow the documentation Weaviate provides here.
# Define the Schema object to use `text-embedding-3-small` on `title` and `content`, but skip it for
article_schema = {
"class": "Article",
"description": "A collection of articles",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text"
}
},
"properties": [{
"name": "title",
"description": "Title of the article",
"dataType": ["string"]
},
{
"name": "content",
"description": "Contents of the article",
"dataType": ["text"],
"moduleConfig": { "text2vec-openai": { "skip": True } }
}]
}
### Step 1 - configure Weaviate Batch, which optimizes CRUD operations in bulk
# - starting batch size of 100
# - dynamically increase/decrease based on performance
# - add timeout retries if something goes wrong
client.batch.configure(
batch_size=100,
dynamic=True,
timeout_retries=3,
)
<weaviate.batch.crud_batch.Batch at 0x3f0ca0fa0>
counter=0
properties = {
"title": v["title"],
"content": v["text"]
}
vector = v["title_vector"]
print(test_article["_additional"]["id"])
print(test_article["title"])
print(test_article["content"])
000393f2-1182-4e3d-abcf-4217eda64be0
Lago d'Origlio
Lago d'Origlio is a lake in the municipality of Origlio, in Ticino, Switzerland.
Lakes of Ticino
Search data
As above, we'll fire some queries at our new Index and get back results based on the closeness
to our existing vectors
return query_result
Weaviate has a built-in module for OpenAI, which takes care of the steps required to generate
a vector embedding for your queries and any CRUD operations.
This allows you to run a vector query with the with_near_text filter, which uses your
OPEN_API_KEY .
nearText = {
"concepts": [query],
"distance": 0.7,
}
properties = [
"title", "content",
"_additional {certainty distance}"
]
query_result = (
client.query
.get(collection_name, properties)
.with_near_text(nearText)
.with_limit(20)
.do()
)["data"]["Get"][collection_name]
Objects returned: 20
1. Museum of Modern Art (Certainty: 0.938) (Distance: 0.125)
2. Western Europe (Certainty: 0.934) (Distance: 0.133)
3. Renaissance art (Certainty: 0.932) (Distance: 0.136)
4. Pop art (Certainty: 0.93) (Distance: 0.14)
5. Northern Europe (Certainty: 0.927) (Distance: 0.145)
6. Hellenistic art (Certainty: 0.926) (Distance: 0.147)
7. Modernist literature (Certainty: 0.923) (Distance: 0.153)
8. Art film (Certainty: 0.922) (Distance: 0.157)
9. Central Europe (Certainty: 0.921) (Distance: 0.157)
10. European (Certainty: 0.921) (Distance: 0.159)
11. Art (Certainty: 0.921) (Distance: 0.159)
12. Byzantine art (Certainty: 0.92) (Distance: 0.159)
13. Postmodernism (Certainty: 0.92) (Distance: 0.16)
14. Eastern Europe (Certainty: 0.92) (Distance: 0.161)
15. Europe (Certainty: 0.919) (Distance: 0.161)
16. Cubism (Certainty: 0.919) (Distance: 0.161)
17. Impressionism (Certainty: 0.919) (Distance: 0.162)
18. Bauhaus (Certainty: 0.919) (Distance: 0.162)
19. Surrealism (Certainty: 0.918) (Distance: 0.163)
20. Expressionism (Certainty: 0.918) (Distance: 0.163)
Objects returned: 20
1. Historic Scotland (Certainty: 0.946) (Distance: 0.107)
2. First War of Scottish Independence (Certainty: 0.946) (Distance: 0.108)
3. Battle of Bannockburn (Certainty: 0.946) (Distance: 0.109)
4. Wars of Scottish Independence (Certainty: 0.944) (Distance: 0.111)
5. Second War of Scottish Independence (Certainty: 0.94) (Distance: 0.121)
6. List of Scottish monarchs (Certainty: 0.937) (Distance: 0.127)
7. Scottish Borders (Certainty: 0.932) (Distance: 0.137)
8. Braveheart (Certainty: 0.929) (Distance: 0.141)
9. John of Scotland (Certainty: 0.929) (Distance: 0.142)
10. Guardians of Scotland (Certainty: 0.926) (Distance: 0.148)
11. Holyrood Abbey (Certainty: 0.925) (Distance: 0.15)
12. Scottish (Certainty: 0.925) (Distance: 0.15)
13. Scots (Certainty: 0.925) (Distance: 0.15)
14. Robert I of Scotland (Certainty: 0.924) (Distance: 0.151)
15. Scottish people (Certainty: 0.924) (Distance: 0.152)
16. Edinburgh Castle (Certainty: 0.924) (Distance: 0.153)
17. Alexander I of Scotland (Certainty: 0.924) (Distance: 0.153)
18. Robert Burns (Certainty: 0.924) (Distance: 0.153)
19. Battle of Bosworth Field (Certainty: 0.922) (Distance: 0.155)
20. David II of Scotland (Certainty: 0.922) (Distance: 0.157)
Cookbook About API Docs Contribute
You want to use Weaviate with the OpenAI module (text2vec-openai), to generate vector
embeddings for you.
This notebook takes you through a simple flow to set up a Weaviate instance, connect to it (with
OpenAI API key), configure data schema, import data (which will automatically generate vector
embeddings for your data), and run hybrid search (mixing of vector and BM25 search).
This is a common requirement for customers who want to store and search our embeddings
with their own data in a secure environment to support production use cases such as chatbots,
topic modelling and more.
What is Weaviate
Weaviate is an open-source vector search engine that stores data objects together with their
vectors. This allows for combining vector search with structured filtering.
Weaviate uses KNN algorithms to create an vector-optimized index, which allows your queries
to run extremely fast. Learn more here.
Weaviate let you use your favorite ML-models, and scale seamlessly into billions of data objects.
Deployment options
Whatever your scenario or production setup, Weaviate has an option for you. You can deploy
Weaviate in the following setups:
Self-hosted – you can deploy Weaviate with docker locally, or any server you want.
SaaS – you can use Weaviate Cloud Service (WCS) to host your Weaviate instances.
Hybrid-SaaS – you can deploy Weaviate in your own private Cloud Service
Programming languages
Weaviate offers four client libraries, which allow you to communicate from your apps:
Python
JavaScript
Java
Go
Additionally, Weaviate has a REST layer. Basically you can call Weaviate from any language that
supports REST requests.
Demo Flow
Note: The import process will automatically index your data - based on the
configuration in the schema
Note: You don't need to explicitly vectorize your data, Weaviate will communicate with
OpenAI to do it for you
Note: You don't need to explicitly vectorize your queries, Weaviate will communicate
with OpenAI to do it for you
Once you've run through this notebook you should have a basic understanding of how to setup
and use vector databases, and can move on to more complex use cases making use of our
embeddings.
This module is responsible for handling vectorization during import (or any CRUD operations)
and when you run a query.
This is great news for you. With text2vec-openai you don't need to manually vectorize your
data, as Weaviate will call OpenAI for you whenever necessary.
1. provide your OpenAI API Key – when you connected to the Weaviate Client
Prerequisites
install libraries
weaviate-client
datasets
apache-beam
===========================================================
1. (Recommended path) Weaviate Cloud Service – to host your Weaviate instance in the
cloud. The free sandbox should be more than enough for this cookbook.
4. make a note of the Cluster Id . The link will take you to the full path of your cluster (you
will need it later to connect to it). It should be something like: https://your-project-
name.weaviate.network
2. Then open your terminal, navigate to where your docker-compose.yml file is located, and
start docker with: docker-compose up -d
3. Once this is ready, your instance should be available at http://localhost:8080
Note. To shut down your docker instance you can call: docker-compose down
Learn more
To learn more, about using Weaviate with Docker see the installation documentation.
===========================================================
Before running this project make sure to have the following libraries:
===========================================================
The OpenAI API key is used for vectorization of your data at import, and for running queries.
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY .
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
import os
# Note. alternatively you can set a temporary env variable like this:
# os.environ['OPENAI_API_KEY'] = 'your-key-goes-here'
1. test env variable OPENAI_API_KEY – make sure you completed the step in #Prepare-your-
OpenAI-API-key
The client
After this step, the client object will be used to perform all Weaviate-related operations.
import weaviate
from datasets import load_dataset
import os
Schema
In this section, we will:
“This is the second and final step, which requires OpenAI specific configuration. After this
step, the rest of instructions wlll only touch on Weaviate, as the OpenAI tasks will be
handled automatically.”
What is a schema
In Weaviate you create schemas to capture each of the entities you will be searching.
title
content
url
We want to vectorize title and content , but not the url .
# Define the Schema object to use `text-embedding-3-small` on `title` and `content`, but skip it for
article_schema = {
"class": "Article",
"description": "A collection of articles",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text"
}
},
"properties": [{
"name": "title",
"description": "Title of the article",
"dataType": ["string"]
},
{
"name": "content",
"description": "Contents of the article",
"dataType": ["text"]
},
{
"name": "url",
"description": "URL to the article",
"dataType": ["string"],
"moduleConfig": { "text2vec-openai": { "skip": True } }
}]
}
Import data
2. configure Weaviate Batch import (to make the import more efficient)
3. import the data into Weaviate
“Note: Like mentioned before. We don't need to manually vectorize the data. The text2vec-
openai module will take care of that.”
# We'll use the datasets library to pull the Simple Wikipedia dataset for embedding
dataset = list(load_dataset("wikipedia", "20220301.simple")["train"])
client.batch.configure(
batch_size=10,
dynamic=True,
timeout_retries=3,
# callback=None,
)
print("Importing Articles")
counter=0
properties = {
"title": article["title"],
"content": article["text"],
"url": article["url"]
}
batch.add_data_object(properties, "Article")
counter = counter+1
print(test_article['title'])
print(test_article['url'])
print(test_article['content'])
Search Data
As above, we'll fire some queries at our new Index and get back results based on the closeness
to our existing vectors
nearText = {
"concepts": [query],
"distance": 0.7,
}
properties = [
"title", "content", "url",
"_additional { score }"
]
result = (
client.query
.get(collection_name, properties)
.with_hybrid(nearText, alpha=alpha_val)
.with_limit(10)
.do()
)
return result["data"]["Get"][collection_name]
Thanks for following along, you're now equipped to set up your own vector databases and use
embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue
to work through other cookbook examples in this repo.
Cookbook About API Docs Contribute
You want to run Q&A (learn more) on your data based on the OpenAI completions
endpoint.
You want to use Weaviate with the OpenAI module (text2vec-openai), to generate vector
embeddings for you.
This notebook takes you through a simple flow to set up a Weaviate instance, connect to it (with
OpenAI API key), configure data schema, import data (which will automatically generate vector
embeddings for your data), and run question answering.
What is Weaviate
Weaviate is an open-source vector search engine that stores data objects together with their
vectors. This allows for combining vector search with structured filtering.
Weaviate uses KNN algorithms to create an vector-optimized index, which allows your queries
to run extremely fast. Learn more here.
Weaviate let you use your favorite ML-models, and scale seamlessly into billions of data objects.
Deployment options
Whatever your scenario or production setup, Weaviate has an option for you. You can deploy
Weaviate in the following setups:
Self-hosted – you can deploy Weaviate with docker locally, or any server you want.
SaaS – you can use Weaviate Cloud Service (WCS) to host your Weaviate instances.
Hybrid-SaaS – you can deploy Weaviate in your own private Cloud Service
Programming languages
Weaviate offers four client libraries, which allow you to communicate from your apps:
Python
JavaScript
Java
Go
Additionally, Weaviate has a REST layer. Basically you can call Weaviate from any language that
supports REST requests.
Demo Flow
Note: The import process will automatically index your data - based on the
configuration in the schema
Note: You don't need to explicitly vectorize your data, Weaviate will communicate with
OpenAI to do it for you
Note: You don't need to explicitly vectorize your queries, Weaviate will communicate
with OpenAI to do it for you
Once you've run through this notebook you should have a basic understanding of how to setup
and use vector databases for question answering.
All Weaviate instances come equipped with the text2vec-openai and the qna-openai modules.
The first module is responsible for handling vectorization at import (or any CRUD operations)
and when you run a search query. The second module communicates with the OpenAI
completions endpoint.
This is great news for you. With text2vec-openai you don't need to manually vectorize your
data, as Weaviate will call OpenAI for you whenever necessary.
1. provide your OpenAI API Key – when you connected to the Weaviate Client
Prerequisites
install libraries
weaviate-client
datasets
apache-beam
===========================================================
1. (Recommended path) Weaviate Cloud Service – to host your Weaviate instance in the
cloud. The free sandbox should be more than enough for this cookbook.
4. make a note of the Cluster Id . The link will take you to the full path of your cluster (you
will need it later to connect to it). It should be something like: https://your-project-
name.weaviate.network
Note. To shut down your docker instance you can call: docker-compose down
Learn more
To learn more, about using Weaviate with Docker see the installation documentation.
===========================================================
Before running this project make sure to have the following libraries:
The Weaviate Python client allows you to communicate with your Weaviate instance from your
Python project.
To load sample data, you need the datasets library and its' dependency apache-beam .
===========================================================
The OpenAI API key is used for vectorization of your data at import, and for queries.
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY .
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
import os
# Note. alternatively you can set a temporary env variable like this:
# os.environ['OPENAI_API_KEY'] = 'your-key-goes-here'
1. test env variable OPENAI_API_KEY – make sure you completed the step in #Prepare-your-
OpenAI-API-key
The client
After this step, the client object will be used to perform all Weaviate-related operations.
import weaviate
from datasets import load_dataset
import os
Schema
In this section, we will:
“This is the second and final step, which requires OpenAI specific configuration. After this
step, the rest of instructions wlll only touch on Weaviate, as the OpenAI tasks will be
handled automatically.”
What is a schema
In Weaviate you create schemas to capture each of the entities you will be searching.
title
content
url
We want to vectorize title and content , but not the url .
To vectorize and query the data, we will use text-embedding-3-small . For Q&A we will use
gpt-3.5-turbo-instruct .
# Define the Schema object to use `text-embedding-3-small` on `title` and `content`, but skip it for
article_schema = {
"class": "Article",
"description": "A collection of articles",
"vectorizer": "text2vec-openai",
"moduleConfig": {
"text2vec-openai": {
"model": "ada",
"modelVersion": "002",
"type": "text"
},
"qna-openai": {
"model": "gpt-3.5-turbo-instruct",
"maxTokens": 16,
"temperature": 0.0,
"topP": 1,
"frequencyPenalty": 0.0,
"presencePenalty": 0.0
}
},
"properties": [{
"name": "title",
"description": "Title of the article",
"dataType": ["string"]
},
{
"name": "content",
"description": "Contents of the article",
"dataType": ["text"]
},
{
"name": "url",
"description": "URL to the article",
"dataType": ["string"],
"moduleConfig": { "text2vec-openai": { "skip": True } }
}]
}
2. configure Weaviate Batch import (to make the import more efficient)
“Note: Like mentioned before. We don't need to manually vectorize the data. The text2vec-
openai module will take care of that.”
# We'll use the datasets library to pull the Simple Wikipedia dataset for embedding
dataset = list(load_dataset("wikipedia", "20220301.simple")["train"])
client.batch.configure(
batch_size=10,
dynamic=True,
timeout_retries=3,
# callback=None,
)
print("Importing Articles")
counter=0
with client.batch as batch:
for article in dataset:
if (counter %10 == 0):
print(f"Import {counter} / {len(dataset)} ")
properties = {
"title": article["title"],
"content": article["text"],
"url": article["url"]
}
batch.add_data_object(properties, "Article")
counter = counter+1
print(test_article['title'])
print(test_article['url'])
print(test_article['content'])
As above, we'll fire some queries at our new Index and get back results based on the closeness
to our existing vectors
properties = [
"title", "content", "url",
"_additional { answer { hasAnswer property result startPosition endPosition } distance }"
]
ask = {
"question": query,
"properties": ["content"]
}
result = (
client.query
.get(collection_name, properties)
.with_ask(ask)
.with_limit(1)
.do()
)
return result["data"]["Get"][collection_name]
Thanks for following along, you're now equipped to set up your own vector databases and use
embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue
to work through other cookbook examples in this repo.
Cookbook About API Docs Contribute
In this notebook we will be going over generating embeddings of movie descriptions with
OpenAI and using those embeddings within Zilliz to find relevant movies. To narrow our search
results and try something new, we are going to be using filtering to do metadata searches. The
dataset in this example is sourced from HuggingFace datasets, and contains a little over 8
thousand movie entries.
Lets begin by first downloading the required libraries for this notebook:
To get Zilliz up and running take a look here. With your account and database set up, proceed
to set the following values:
import openai
URI = 'your_uri'
TOKEN = 'your_token' # TOKEN == user:password or api_key
COLLECTION_NAME = 'book_search'
DIMENSION = 1536
OPENAI_ENGINE = 'text-embedding-3-small'
openai.api_key = 'sk-your_key'
INDEX_PARAM = {
'metric_type':'L2',
'index_type':"AUTOINDEX",
'params':{}
}
QUERY_PARAM = {
"metric_type": "L2",
"params": {},
}
BATCH_SIZE = 1000
Dataset
With Zilliz up and running we can begin grabbing our data. Hugging Face Datasets is a hub
that holds many different user datasets, and for this example we are using HuggingLearners's
netflix-shows dataset. This dataset contains movies and their metadata pairs for over 8
thousand movies. We are going to embed each description and store it within Zilliz along with
its title, type, release_year and rating.
import datasets
/Users/filiphaltmayer/miniconda3/envs/haystack/lib/python3.9/site-packages/tqdm/auto.py:22: Tqd
from .autonotebook import tqdm as notebook_tqdm
Found cached dataset csv (/Users/filiphaltmayer/.cache/huggingface/datasets/hugginglearners___c
Now that we have our data on our machine we can begin embedding it and inserting it into
Zilliz. The embedding function takes in text and returns the embeddings in a list format.
data = [
[], # title
[], # type
[], # release_year
[], # rating
[], # description
]
With our data safely inserted into Zilliz, we can now perform a query. The query takes in a tuple
of the movie description you are searching for and the filter to use. More info about the filter
can be found here. The search first prints out your description and filter expression. After that
for each result we print the score, title, type, release year, rating and description of the result
movies.
import textwrap
my_query = ('movie about a fluffly animal', 'release_year < 2019 and rating like \"PG%\"')
query(my_query)
Description: movie about a fluffly animal Expression: release_year < 2019 and rating like "PG%"
Results:
Rank: 1 Score: 0.30085673928260803 Title: The Lamb
Type: Movie Release Year: 2017 Rating: PG
A big-dreaming donkey escapes his menial existence and befriends some free-spirited
animal pals in this imaginative retelling of the Nativity Story.
'Not the help page you were looking for? Try these similar pages.'
This notebook demonstrates how to use embeddings to find similar items to recommend. In
particular, we use AG's corpus of news articles as our dataset.
Our model will answer the question: given an article, what other articles are most similar to it?
import pandas as pd
import pickle
EMBEDDING_MODEL = "text-embedding-3-small"
2. Load data
Next, let's load the AG news data and see what it looks like.
n_examples = 5
df.head(n_examples)
Nvidia Puts a Firewall on a Motherboard PC World - Upcoming chip set will 4 Sci/Tech
1
(PC Wo... include buil...
Olympic joy in Greek, Chinese press Newspapers in Greece reflect a mixture 2 Sports
2
of exhi...
U2 Can iPod with Pictures SAN JOSE, Calif. -- Apple Computer 4 Sci/Tech
3
(Quote, Cha...
The Dream Factory Any product, any shape, any size -- 4 Sci/Tech
4
manufactur...
Let's take a look at those same examples, but not truncated by ellipses.
Before getting embeddings for these articles, let's set up a cache to save the embeddings we
generate. In general, it's a good idea to save your embeddings so you can re-use them later. If
you don't save them, you'll pay again each time you compute them again.
The cache is a dictionary that maps tuples of (text, model) to an embedding, which is a list of
floats. The cache is saved as a Python pickle file.
# define a function to retrieve embeddings from the cache if present, and otherwise request via the A
def embedding_from_string(
string: str,
model: str = EMBEDDING_MODEL,
embedding_cache=embedding_cache
) -> list:
"""Return embedding of given string, using a cache to avoid recomputing."""
if (string, model) not in embedding_cache.keys():
embedding_cache[(string, model)] = get_embedding(string, model)
with open(embedding_cache_path, "wb") as embedding_cache_file:
pickle.dump(embedding_cache, embedding_cache_file)
return embedding_cache[(string, model)]
Example string: BRITAIN: BLAIR WARNS OF CLIMATE THREAT Prime Minister Tony Blair urged the inte
2. Calculate the distance between a source title and all other articles
def print_recommendations_from_strings(
strings: list[str],
index_of_source_string: int,
k_nearest_neighbors: int = 1,
model=EMBEDDING_MODEL,
) -> list[int]:
"""Print out the k nearest neighbors of a given string."""
# get embeddings for all strings
embeddings = [embedding_from_string(string, model=model) for string in strings]
# get distances between the source embedding and other embeddings (function from utils.embeddings
distances = distances_from_embeddings(query_embedding, embeddings, distance_metric="cosine")
return indices_of_nearest_neighbors
5. Example recommendations
Let's look for articles similar to first one, which was about Tony Blair.
article_descriptions = df["description"].tolist()
tony_blair_articles = print_recommendations_from_strings(
strings=article_descriptions, # let's base similarity off of the article description
index_of_source_string=0, # articles similar to the first one about Tony Blair
k_nearest_neighbors=5, # 5 most similar articles
)
Source string: BRITAIN: BLAIR WARNS OF CLIMATE THREAT Prime Minister Tony Blair urged the inter
Pretty good! 4 of the 5 recommendations explicitly mention Tony Blair and the fifth is an article
from London about climate change, topics that might be often associated with Tony Blair.
Let's see how our recommender does on the second example article about NVIDIA's new
chipset with more security.
chipset_security_articles = print_recommendations_from_strings(
strings=article_descriptions, # let's base similarity off of the article description
index_of_source_string=1, # let's look at articles similar to the second one about a more secure
k_nearest_neighbors=5, # let's look at the 5 most similar articles
)
Source string: PC World - Upcoming chip set will include built-in security features for your PC
From the printed distances, you can see that the #1 recommendation is much closer than all the
others (0.11 vs 0.14+). And the #1 recommendation looks very similar to the starting article - it's
another article from PC World about increasing computer security. Pretty good!
A more sophisticated way to build a recommender system is to train a machine learning model
that takes in tens or hundreds of signals, such as item popularity or user click data. Even in this
system, embeddings can be a very useful signal into the recommender, especially for items that
are being 'cold started' with no user data yet (e.g., a brand new product added to the catalog
without any clicks yet).
Appendix: Using embeddings to visualize similar articles
To get a sense of what our nearest neighbor recommender is doing, let's visualize the article
embeddings. Although we can't plot the 2048 dimensions of each embedding vector, we can
use techniques like t-SNE or PCA to compress the embeddings down into 2 or 3 dimensions,
which we can chart.
Before visualizing the nearest neighbors, let's visualize all of the article descriptions using t-SNE.
Note that t-SNE is not deterministic, meaning that results may vary from run to run.
chart_from_components(
components=tsne_components,
labels=labels,
strings=article_descriptions,
width=600,
height=500,
title="t-SNE components of article descriptions",
)
As you can see in the chart above, even the highly compressed embeddings do a good job of
clustering article descriptions by category. And it's worth emphasizing: this clustering is done
with no knowledge of the labels themselves!
Also, if you look closely at the most egregious outliers, they are often due to mislabeling rather
than poor embedding. For example, the majority of the blue World points in the green Sports
cluster appear to be Sports stories.
Next, let's recolor the points by whether they are a source article, its nearest neighbors, or other.
Looking at the 2D chart above, we can see that the articles about Tony Blair are somewhat close
together inside of the World news cluster. Interestingly, although the 5 nearest neighbors (red)
were closest in high dimensional space, they are not the closest points in this compressed 2D
space. Compressing the embeddings down to 2 dimensions discards much of their information,
and the nearest neighbors in the 2D space don't seem to be as relevant as those in the full
embedding space.
For the chipset security example, the 4 closest nearest neighbors in the full embedding space
remain nearest neighbors in this compressed 2D visualization. The fifth is displayed as more
distant, despite being closer in the full embedding space.
Should you want to, you can also make an interactive 3D plot of the embeddings with the
function chart_from_components_3D . (Doing so will require recomputing the t-SNE components
with n_components=3 .)
Cookbook About API Docs Contribute
This notebook provides an introduction to using Redis as a vector database with OpenAI
embeddings and running hybrid queries that combine VSS and lexical search using Redis Query
and Search capability. Redis is a scalable, real-time database that can be used as a vector
database when using the RediSearch Module. The Redis Query and Search capability allows you
to index and search for vectors in Redis. This notebook will show you how to use the Redis
Query and Search to index and search for vectors created by using the OpenAI API and stored
in Redis.
Hybrid queries combine vector similarity with traditional Redis Query and Search filtering
capabilities on GEO, NUMERIC, TAG or TEXT data simplifying application code. A common
example of a hybrid query in an e-commerce use case is to find items visually similar to a given
query image limited to items available in a GEO location and within a price range.
Prerequisites
install libraries
Redis-py
===========================================================
Start Redis
To keep this example simple, we will use the Redis Stack docker container which we can start as
follows
$ docker-compose up -d
This also includes the RedisInsight GUI for managing your Redis database which you can view
at http://localhost:8001 once you start the docker container.
You're all set up and ready to go! Next, we import and create our client for communicating with
the Redis database we just created.
Install Requirements
Redis-Py is the python client for communicating with Redis. We will use this to communicate
with our Redis-stack database.
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY by
using following command:
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
import os
import openai
os.environ["OPENAI_API_KEY"] = '<YOUR_OPENAI_API_KEY>'
OPENAI_API_KEY is ready
Load data
In this section we'll load and clean an ecommerce dataset. We'll generate embeddings using
OpenAI and use this data to create an index in Redis and then search for similar vectors.
import pandas as pd
import numpy as np
from typing import List
EMBEDDING_MODEL = "text-embedding-3-small"
# load in data and clean data types and drop null rows
df = pd.read_csv("../../data/styles_2k.csv", on_bad_lines='skip')
df.dropna(inplace=True)
df["year"] = df["year"].astype(int)
df.info()
# print dataframe
n_examples = 5
df.head(n_examples)
<class 'pandas.core.frame.DataFrame'>
Index: 1978 entries, 0 to 1998
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 1978 non-null int64
1 gender 1978 non-null object
2 masterCategory 1978 non-null object
3 subCategory 1978 non-null object
4 articleType 1978 non-null object
5 baseColour 1978 non-null object
6 season 1978 non-null object
7 year 1978 non-null int64
8 usage 1978 non-null object
9 productDisplayName 1978 non-null object
dtypes: int64(2), object(8)
memory usage: 170.0+ KB
15970 Men Apparel Topwear Shirts Navy Blue Fall 2011 Casual Turtle Check Me
0
Navy Blue Shirt
39386 Men Apparel Bottomwear Jeans Blue Summer 2012 Casual Peter England M
1
Party Blue Jean
df.info()
<class 'pandas.core.frame.DataFrame'>
Index: 1978 entries, 0 to 1998
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 product_id 1978 non-null int64
1 gender 1978 non-null object
2 masterCategory 1978 non-null object
3 subCategory 1978 non-null object
4 articleType 1978 non-null object
5 baseColour 1978 non-null object
6 season 1978 non-null object
7 year 1978 non-null int64
8 usage 1978 non-null object
9 productDisplayName 1978 non-null object
10 product_text 1978 non-null object
dtypes: int64(2), object(9)
memory usage: 185.4+ KB
# check out one of the texts we will use to create semantic embeddings
df["product_text"][0]
'name turtle check men navy blue shirt category apparel subcategory topwear color navy blue gen
Connect to Redis
Now that we have our Redis database running, we can connect to it using the Redis-py client.
We will use the default host and port for the Redis database which is localhost:6379 .
import redis
from redis.commands.search.indexDefinition import (
IndexDefinition,
IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
TagField,
NumericField,
TextField,
VectorField
)
REDIS_HOST = "localhost"
REDIS_PORT = 6379
REDIS_PASSWORD = "" # default for passwordless Redis
# Connect to Redis
redis_client = redis.Redis(
host=REDIS_HOST,
port=REDIS_PORT,
password=REDIS_PASSWORD
)
redis_client.ping()
True
The below cells will show how to specify and create a search index in Redis. We will:
1. Set some constants for defining our index like the distance metric and the index name
# Constants
INDEX_NAME = "product_embeddings" # name of the search index
PREFIX = "doc" # prefix for the document keys
DISTANCE_METRIC = "L2" # distance metric for the vectors (ex. COSINE, IP, L2)
NUMBER_OF_VECTORS = len(df)
Now that we have a search index, we can load documents into it. We will use the dataframe
containing the styles dataset loaded previously. In Redis, either the HASH or JSON (if using
RedisJSON in addition to RediSearch) data types can be used to store documents. We will use
the HASH data type in this example. The cells below will show how to get OpenAI embeddings
for the different products and load documents into the index.
# Use Redis pipelines to batch calls and save on round trip network communication
pipe = client.pipeline()
for idx,doc in enumerate(records,start=1):
key = f"{prefix}:{str(doc['product_id'])}"
%%time
index_documents(redis_client, PREFIX, df)
print(f"Loaded {redis_client.info()['db0']['keys']} documents in Redis search index with name: {INDEX
Now that we have a search index and documents loaded into it, we can run search queries.
Below we will provide a function that will run a search query and return the results. Using this
function we run a few queries that will show how you can utilize Redis as a vector database.
def search_redis(
redis_client: redis.Redis,
user_query: str,
index_name: str = "product_embeddings",
vector_field: str = "product_vector",
return_fields: list = ["productDisplayName", "masterCategory", "gender", "season", "year", "vecto
hybrid_fields = "*",
k: int = 20,
print_results: bool = True,
) -> List[dict]:
The previous examples showed how run vector search queries with RediSearch. In this section,
we will show how to combine vector search with other RediSearch fields for hybrid search. In the
example below, we will combine vector search with full text search.
# improve search quality by adding hybrid query for "man blue jeans" in the product vector combined w
results = search_redis(redis_client,
"man blue jeans",
vector_field="product_vector",
k=10,
hybrid_fields='@productDisplayName:"blue jeans"'
)
# hybrid query for shirt in the product vector and only include results with the phrase "slim fit" in
results = search_redis(redis_client,
"shirt",
vector_field="product_vector",
k=10,
hybrid_fields='@productDisplayName:"slim fit"'
)
# hybrid query for watch in the product vector and only include results with the tag "Accessories" in
results = search_redis(redis_client,
"watch",
vector_field="product_vector",
k=10,
hybrid_fields='@masterCategory:{Accessories}'
)
# hybrid query for sandals in the product vector and only include results within the 2011-2012 year r
results = search_redis(redis_client,
"sandals",
vector_field="product_vector",
k=10,
hybrid_fields='@year:[2011 2012]'
)
# hybrid query for sandals in the product vector and only include results within the 2011-2012 year r
results = search_redis(redis_client,
"blue sandals",
vector_field="product_vector",
k=10,
hybrid_fields='(@year:[2011 2012] @season:{Summer})'
)
# hybrid query for a brown belt filtering results by a year (NUMERIC) with a specific article types (
results = search_redis(redis_client,
"brown belt",
vector_field="product_vector",
k=10,
hybrid_fields='(@year:[2012 2012] @articleType:{Shirts | Belts} @productDispla
)
This notebook guides you step by step on using Qdrant as a vector database for OpenAI
embeddings. Qdrant is a high-performant vector search database written in Rust. It offers
RESTful and gRPC APIs to manage your embeddings. There is an official Python qdrant-client
that eases the integration with your apps.
4. Using Qdrant to perform the nearest neighbour search in the created collection.
What is Qdrant
Qdrant is an Open Source vector database that allows storing neural embeddings along with
the metadata, a.k.a payload. Payloads are not only available for keeping some additional
attributes of a particular point, but might be also used for filtering. Qdrant offers a unique
filtering mechanism which is built-in into the vector search phase, what makes it really efficient.
Deployment options
Qdrant might be launched in various ways, depending on the target load on the application it
might be hosted:
Integration
Qdrant provides both RESTful and gRPC APIs which makes integration easy, no matter the
programming language you use. However, there are some official clients for the most popular
languages available, and if you use Python then the Python Qdrant client library might be the
best choice.
Prerequisites
We're going to use a local Qdrant instance running in a Docker container. The easiest way to
launch it is to use the attached [docker-compose.yaml] file and run the following command:
! docker-compose up -d
qdrant_qdrant_1 is up-to-date
We might validate if the server was launched successfully by running a simple curl command:
! curl http://localhost:6333
This notebook obviously requires the openai and qdrant-client packages, but there are also
some other additional libraries we will use. The following command installs them all:
The OpenAI API key is used for vectorization of the documents and queries.
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY by
running following command:
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
import os
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
OPENAI_API_KEY is ready
Connect to Qdrant
Connecting to a running instance of Qdrant server is easy with the official Python library:
import qdrant_client
client = qdrant_client.QdrantClient(
host="localhost",
prefer_grpc=True,
)
client.get_collections()
CollectionsResponse(collections=[])
Load data
In this section we are going to load the data prepared previous to this session, so you don't
have to recompute the embeddings of Wikipedia articles with your own credits.
import wget
embeddings_url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
0% [ ] 0 / 69893
0% [ ] 8192 / 69893
0% [ ] 16384 / 69893
0% [ ] 24576 / 69893
0% [ ] 32768 / 69893
0% [ ] 40960 / 69893
0% [ ] 49152 / 69893
0% [ ] 57344 / 69893
0% [ ] 65536 / 69893
0% [ ] 73728 / 69893
0% [ ] 81920 / 69893
0% [ ] 90112 / 69893
0% [ ] 98304 / 69893
0% [ ] 106496 / 69893
0% [ ] 114688 / 69893
0% [ ] 122880 / 69893
0% [ ] 131072 / 69893
0% [ ] 139264 / 69893
0% [ ] 147456 / 69893
0% [ ] 155648 / 69893
0% [ ] 163840 / 69893
0% [ ] 172032 / 69893
0% [ ] 180224 / 69893
0% [ ] 188416 / 69893
0% [ ] 196608 / 69893
0% [ ] 204800 / 69893
0% [ ] 212992 / 69893
import zipfile
import pandas as pd
article_df = pd.read_csv('../data/vector_database_wikipedia_articles_embedded.csv')
# Read vectors from strings back into a list
article_df["title_vector"] = article_df.title_vector.apply(literal_eval)
article_df["content_vector"] = article_df.content_vector.apply(literal_eval)
article_df.head()
Index data
Qdrant stores data in collections where each object is described by at least one vector and may
contain an additional metadata called payload. Our collection will be called Articles and each
object will be described by both title and content vectors. Qdrant does not require you to set
up any kind of schema beforehand, so you can freely put points to the collection with a simple
setup only.
We will start with creating a collection, and then we will fill it with our precomputed
embeddings.
vector_size = len(article_df["content_vector"][0])
client.recreate_collection(
collection_name="Articles",
vectors_config={
"title": rest.VectorParams(
distance=rest.Distance.COSINE,
size=vector_size,
),
"content": rest.VectorParams(
distance=rest.Distance.COSINE,
size=vector_size,
),
}
)
True
client.upsert(
collection_name="Articles",
points=[
rest.PointStruct(
id=k,
vector={
"title": v["title_vector"],
"content": v["content_vector"],
},
payload=v.to_dict(),
)
for k, v in article_df.iterrows()
],
)
# Check the collection size to make sure all the points have been stored
client.count(collection_name="Articles")
CountResult(count=25000)
Search data
Once the data is put into Qdrant we will start querying the collection for the closest vectors. We
may provide an additional parameter vector_name to switch from title to content based search.
Since the precomputed embeddings were created with text-embedding-3-small OpenAI model
we also have to use it during search.
import openai
query_results = client.search(
collection_name=collection_name,
query_vector=(
vector_name, embedded_query
),
limit=top_k,
)
return query_results
Large language models are functions that map text to text. Given an input string of text, a large
language model predicts the text that should come next.
The magic of large language models is that by being trained to minimize this prediction error
over vast quantities of text, the models end up learning concepts useful for these predictions.
For example, they learn:
how to spell
how to paraphrase
how to code
etc.
They do this by “reading” a large amount of existing text and learning how words tend to
appear in context with other words, and uses what it has learned to predict the next most likely
word that might appear in response to a user request, and each subsequent word after that.
GPT-3 and GPT-4 power many software products, including productivity apps, education apps,
games, and more.
How to control a large language model
Of all the inputs to a large language model, by far the most influential is the text prompt.
Completion: Induce the model to complete the beginning of what you want
Instruction prompts
Write your instruction at the top of the prompt (or at the bottom, or both), and the model will
do its best to follow the instruction and then stop. Instructions can be detailed, so don't be
afraid to write a paragraph explicitly detailing the output you want, just stay aware of how many
tokens the model can process.
Output:
Ted Chiang
“Some humans theorize that intelligent species go extinct before they can expand into outer space.
― Ted Chiang, Exhalation
The author of this quote is
Output:
Ted Chiang
Giving the model a scenario to follow or role to play out can be helpful for complex queries or
when seeking imaginative responses. When using a hypothetical prompt, you set up a situation,
problem, or story, and then ask the model to respond as if it were a character in that scenario or
an expert on the topic.
Your role is to extract the name of the author from any given text
“Some humans theorize that intelligent species go extinct before they can expand into outer space.
― Ted Chiang, Exhalation
Output:
Ted Chiang
Demonstration prompt example (few-shot learning)
Similar to completion-style prompts, demonstrations can show the model what you want it to
do. This approach is sometimes called few-shot learning, as the model learns from a few
examples provided in the prompt.
Quote:
“When the reasoning mind is forced to confront the impossible again and again, it has no choice but
― N.K. Jemisin, The Fifth Season
Author: N.K. Jemisin
Quote:
“Some humans theorize that intelligent species go extinct before they can expand into outer space.
― Ted Chiang, Exhalation
Author:
Output:
Ted Chiang
Example fine-tuned prompt (for a model that has been custom trained on similar prompt-
completion pairs):
“Some humans theorize that intelligent species go extinct before they can expand into outer space.
― Ted Chiang, Exhalation
###
Output:
Ted Chiang
Code Capabilities
Large language models aren't only great at text - they can be great at code too. OpenAI's GPT-
4 model is a prime example.
Cursor (build software faster in an editor designed for pair-programming with AI)
GPT-4 is more advanced than previous models like gpt-3.5-turbo-instruct . But, to get the
best out of GPT-4 for coding tasks, it's still important to give clear and specific instructions. As a
result, designing good prompts can take more care.
In general, the input prompt is the best lever for improving model outputs. You can try tricks
like:
Be more specific E.g., if you want the output to be a comma separated list, ask it to return a
comma separated list. If you want it to say "I don't know" when it doesn't know the answer,
tell it 'Say "I don't know" if you do not know the answer.' The more specific your
instructions, the better the model can respond.
Provide Context: Help the model understand the bigger picture of your request. This could
be background information, examples/demonstrations of what you want or explaining the
purpose of your task.
Ask the model to answer as if it was an expert. Explicitly asking the model to produce high
quality output or output as if it was written by an expert can induce the model to give
higher quality answers that it thinks an expert would write. Phrases like "Explain in detail" or
"Describe step-by-step" can be effective.
Prompt the model to write down the series of steps explaining its reasoning. If
understanding the 'why' behind an answer is important, prompt the model to include its
reasoning. This can be done by simply adding a line like "Let's think step by step" before
each answer.
Cookbook About API Docs Contribute
Given a text string (e.g., "tiktoken is great!" ) and an encoding (e.g., "cl100k_base" ), a
tokenizer can split the text string into a list of tokens (e.g., ["t", "ik", "token", " is", "
great", "!"] ).
Splitting text strings into tokens is useful because GPT models see text in the form of tokens.
Knowing how many tokens are in a text string can tell you (a) whether the string is too long for
a text model to process and (b) how much an OpenAI API call costs (as usage is priced by
token).
Encodings
Encodings specify how text is converted into tokens. Different models use different encodings.
You can retrieve the encoding for a model using tiktoken.encoding_for_model() as follows:
encoding = tiktoken.encoding_for_model('gpt-3.5-turbo')
Note that p50k_base overlaps substantially with r50k_base , and for non-code applications,
they will usually give the same tokens.
Python: tiktoken
Java: jtokkit
Golang: tiktoken-go
Rust: tiktoken-rs
JavaScript: gpt-3-encoder
Java: gpt2-tokenizer-java
PHP: GPT-3-Encoder-PHP
Golang: tiktoken-go
Rust: tiktoken-rs
In English, tokens commonly range in length from one character to one word (e.g., "t" or "
great" ), though in some languages tokens can be shorter than one character or longer than
one word. Spaces are usually grouped with the starts of words (e.g., " is" instead of "is " or
" " + "is" ). You can quickly check how a string is tokenized at the OpenAI Tokenizer, or the
third-party Tiktokenizer webapp.
0. Install tiktoken
1. Import tiktoken
import tiktoken
2. Load an encoding
The first time this runs, it will require an internet connection to download. Later runs won't need
an internet connection.
encoding = tiktoken.get_encoding("cl100k_base")
encoding = tiktoken.encoding_for_model("gpt-3.5-turbo")
The .encode() method converts a text string into a list of token integers.
encoding.encode("tiktoken is great!")
'tiktoken is great!'
Warning: although .decode() can be applied to single tokens, beware that it can be lossy for
tokens that aren't on utf-8 boundaries.
For single tokens, .decode_single_token_bytes() safely converts a single integer token to the
bytes it represents.
(The b in front of the strings indicates that the strings are byte strings.)
5. Comparing encodings
Different encodings vary in how they split words, group spaces, and handle non-English
characters. Using the methods above, we can compare different encodings on a few example
strings.
compare_encodings("antidisestablishmentarianism")
r50k_base: 5 tokens
token integers: [415, 29207, 44390, 3699, 1042]
token bytes: [b'ant', b'idis', b'establishment', b'arian', b'ism']
p50k_base: 5 tokens
token integers: [415, 29207, 44390, 3699, 1042]
token bytes: [b'ant', b'idis', b'establishment', b'arian', b'ism']
cl100k_base: 6 tokens
token integers: [519, 85342, 34500, 479, 8997, 2191]
token bytes: [b'ant', b'idis', b'establish', b'ment', b'arian', b'ism']
compare_encodings("2 + 2 = 4")
r50k_base: 5 tokens
token integers: [17, 1343, 362, 796, 604]
token bytes: [b'2', b' +', b' 2', b' =', b' 4']
p50k_base: 5 tokens
token integers: [17, 1343, 362, 796, 604]
token bytes: [b'2', b' +', b' 2', b' =', b' 4']
cl100k_base: 7 tokens
token integers: [17, 489, 220, 17, 284, 220, 19]
token bytes: [b'2', b' +', b' ', b'2', b' =', b' ', b'4']
compare_encodings("お誕生日おめでとう")
r50k_base: 14 tokens
token integers: [2515, 232, 45739, 243, 37955, 33768, 98, 2515, 232, 1792, 223, 30640, 30201, 2
token bytes: [b'\xe3\x81', b'\x8a', b'\xe8\xaa', b'\x95', b'\xe7\x94\x9f', b'\xe6\x97', b'\xa5'
p50k_base: 14 tokens
token integers: [2515, 232, 45739, 243, 37955, 33768, 98, 2515, 232, 1792, 223, 30640, 30201, 2
token bytes: [b'\xe3\x81', b'\x8a', b'\xe8\xaa', b'\x95', b'\xe7\x94\x9f', b'\xe6\x97', b'\xa5'
cl100k_base: 9 tokens
token integers: [33334, 45918, 243, 21990, 9080, 33334, 62004, 16556, 78699]
token bytes: [b'\xe3\x81\x8a', b'\xe8\xaa', b'\x95', b'\xe7\x94\x9f', b'\xe6\x97\xa5', b'\xe3\x
ChatGPT models like gpt-3.5-turbo and gpt-4 use tokens in the same way as older
completions models, but because of their message-based formatting, it's more difficult to count
how many tokens will be used by a conversation.
Below is an example function for counting tokens for messages passed to gpt-3.5-turbo or
gpt-4 .
Note that the exact way that tokens are counted from messages may change from model to
model. Consider the counts from the function below an estimate, not a timeless guarantee.
In particular, requests that use the optional functions input will consume extra tokens on top of
the estimates calculated below.
# let's verify the function above matches the OpenAI API response
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
example_messages = [
{
"role": "system",
"content": "You are a helpful, pattern-following assistant that translates corporate jargon i
},
{
"role": "system",
"name": "example_user",
"content": "New synergies will help drive top-line growth.",
},
{
"role": "system",
"name": "example_assistant",
"content": "Things working well together will increase revenue.",
},
{
"role": "system",
"name": "example_user",
"content": "Let's circle back when we have more bandwidth to touch base on opportunities for
},
{
"role": "system",
"name": "example_assistant",
"content": "Let's talk later when we're less busy about how to do better.",
},
{
"role": "user",
"content": "This late pivot means we don't have time to boil the ocean for the client deliver
},
]
for model in [
"gpt-3.5-turbo-0301",
"gpt-3.5-turbo-0613",
"gpt-3.5-turbo",
"gpt-4-0314",
"gpt-4-0613",
"gpt-4",
]:
print(model)
# example token count from the function defined above
print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_f
# example token count from the OpenAI API
response = client.chat.completions.create(model=model,
messages=example_messages,
temperature=0,
max_tokens=1)
print(f'{response.usage.prompt_tokens} prompt tokens counted by the OpenAI API.')
print()
gpt-3.5-turbo-0301
127 prompt tokens counted by num_tokens_from_messages().
127 prompt tokens counted by the OpenAI API.
gpt-3.5-turbo-0613
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-3.5-turbo
Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-4-0314
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-4-0613
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-4
Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
Cookbook About API Docs Contribute
DALL·E-3 is the latest version of our DALL-E text-to-image generation models. As the current
state of the art in text-to-image generation, DALL·E is capable of generating high-quality
images across a wide variety of domains. If you're interested in more technical details of how
DALL·E-3 was built, you can read more about in our research paper. I'll be going over some of
the new features and capabilities of DALL·E-3 in this article, as well as some examples of what
new products you can build with the API.
As a reminder, the Image generation API hasn't changed and maintains the same endpoints and
formatting as with DALL·E-2. If you're looking for a guide on how to use the Image API, see the
Cookbook article on the subject.
The only API endpoint available for use with DALL·E-3 right now is Generations
(/v1/images/generations). We don’t support variations or inpainting yet, though the Edits and
Variations endpoints are available for use with DALL·E-2.
Generations
The generation API endpoint creates an image based on a text prompt. There’s a couple new
parameters that we've added to enhance what you can create with our models. Here’s a quick
overview of the options:
New parameters:
model (‘dall-e-2’ or ‘dall-e-3’): This is the model you’re generating with. Be careful to set it
to ‘dall-e-3’ as it defaults to ‘dall-e-2’ if empty.
style (‘natural’ or ‘vivid’): The style of the generated images. Must be one of vivid or natural.
Vivid causes the model to lean towards generating hyper-real and dramatic images. Natural
causes the model to produce more natural, less hyper-real looking images. Defaults to
‘vivid’.
quality (‘standard’ or ‘hd’): The quality of the image that will be generated. ‘hd’ creates
images with finer details and greater consistency across the image. Defaults to ‘standard’.
Other parameters:
prompt (str): A text description of the desired image(s). The maximum length is 1000
characters. Required field.
n (int): The number of images to generate. Must be between 1 and 10. Defaults to 1. For
dall-e-3, only n=1 is supported.
size (...): The size of the generated images. Must be one of 256x256, 512x512, or 1024x1024
for DALL·E-2 models. Must be one of 1024x1024, 1792x1024, or 1024x1792 for DALL·E-3
models.
response_format ('url' or 'b64_json'): The format in which the generated images are
returned. Must be one of "url" or "b64_json". Defaults to "url".
user (str): A unique identifier representing your end-user, which will help OpenAI to
monitor and detect abuse. Learn more.
New Features
Our launch of DALL·E-3 comes with lots of new features and capabilities to help you generate
the images you want. Here’s a quick overview of what’s new:
Prompt Rewriting
A new feature in the latest DALL·E-3 API is prompt rewriting, where we use GPT-4 to optimize all
of your prompts before they’re passed to DALL-E. In our research, we’ve seen that using very
detailed prompts give significantly better results. You can read more about our captioning,
prompting, and safety mitigations in the DALL·E-3 research paper.
Keep in mind that this feature isn’t able to be disabled at the moment, though you can achieve a
high level of fidelity by simply giving instructions to the relabeler in your prompt, as I'll show
below with examples.
Standard vs HD Quality
DALL·E-3 introduces a new 'quality' parameter that allows you to adjust the level of detail and
organization in all of your generations. The 'standard' quality generations are the DALL·E-3
you're familiar with, with 'hd' generations bringing a new level of attention to detail and
adherence to your prompt. Keep in mind that setting your generation quality to ‘hd’ does
increase the cost per image, as well as often increasing the time it takes to generate by ~10
seconds or so.
For example, here we have two different icons in 'hd' and 'standard' quality. Often the choice
between either quality is up to taste, but 'hd' often wins when the task requires more ability to
capture details and textures or better composition of a scene.
Here's another example, this time with a prompt of 'An infinite, uniform grid of tessellated
cubes.', which DALL·E conveniently rewrites as "An infinite, uniform grid of tessellated cubes
painted carefully in an isometric perspective. The cubes are meticulously arranged in such a way
that they seem to stretch endlessly into the distance. Each cube is identical to the next, with light
reflecting consistently across all surfaces, underscoring their uniformity. This is a digitally rendered
image.":
New Sizes
DALL·E-3 accepts three different image sizes: 1024px by 1024px, 1792px by 1024px, and 1024px
by 1792px. Beyond giving more flexibility in terms of aspect ratio, these sizes can have
significant effects on the style and context of your generated image. For example, vertical
images might work better when you’re looking for an image that looks like it was taken by a
cellphone camera, or horizontal images may work better for landscape paintings or digital
designs.
To demonstrate this difference, here’s multiple variations on the same input prompt with a
different aspect ratio. In this case, my prompt was: “Professional photoshoot of a Chemex
brewer in the process of brewing coffee.” (For reference, this is a photo of a real Chemex
brewer).
Here is the generation in square form (in both HD and standard qualities):
You can see how these images are framed closely to the item and seem to be taken in a more
closed space with various surrounding items nearby.
Here are the results on the same prompts with a wider aspect ratio:
Compared to the previous generations, these come in the form of close-ups. The background is
blurred, with greater focus on the item itself, more like professionally organized photoshoots
rather than quick snaps.
Lastly, we have the vertical aspect ratio:
These feel more akin to cellphone images, with a more candid appearance. There’s more action
involved: the slowly dripping coffee or the active pour from the pot.
New Styles
DALL·E-3 introduces two new styles: natural and vivid. The natural style is more similar to the
DALL·E-2 style in its 'blander' realism, while the vivid style is a new style that leans towards
generating hyper-real and cinematic images. For reference, all DALL·E generations in ChatGPT
are generated in the 'vivid' style.
The natural style is specifically useful in cases where DALL·E-3 over-exaggerates or confuses a
subject that's supposed to be more simple, subdued, or realistic. I've often used it for logo
generation, stock photos, or other cases where I'm trying to match a real-world object.
Here's an example of the same prompt as above in the vivid style. The vivid is far more
cinematic (and looks great), but might pop too much if you're not looking for that.
There's many cases in which I prefer the natural style, such as this example of a painting in the
style of Thomas Cole's 'Desolation':
Examples and Prompts
To help you get started building with DALL·E-3, I've come up with a few examples of products
you could build with the API, as well as collected some styles and capabilities that seem to be
unique to DALL·E-3 at the moment. I've also listed some subjects that I'm struggling to prompt
DALL·E-3 to generate in case you want to try your hand at it.
Icon Generation
Have you ever struggled to find the perfect icon for your website or app? It would be awesome
to see a custom icon generator app that lets you pick the style, size, and subject of your icon,
and then generates a custom SVG from the DALL·E generation. Here's some examples of helpful
website icons I generated with DALL·E-3:
In this case, I used Potrace to convert the images to SVGs, which you can download here. This is
what I used to convert the images:
You might need to boost the brightness and contrast of the image before converting it to an
SVG. I used the following command to do so:
Logo Generation
DALL·E-3 is great at jumpstarting the logo creation process for your company or product. By
prompting DALL·E to create 'Vector logo design of a Greek statue, minimalistic, with a white
background' I achieved the following:
Here's another logo I created, this time for an Arabian coffee shop:
In the case of iterating on an existing logo, I took OpenAI's logo, asked GPT-4V to describe it,
and then asked DALL·E to generate variations on the logo:
Custom Tattoos
DALL·E-3 is great at generating line art, which might be useful for generating custom tattoos.
Here's some line art I generated with DALL·E-3:
Here's what I managed to create. They might need some work, but I think they're a good start:
And much more...
Here's some ideas I've had that I haven't had time to try yet:
Custom emojis or Twitch emotes?
Vector illustrations?
Album art?
Showcase
We're really just starting to figure out what DALL·E-3 is capable of. Here's some of the best
styles, generations, and prompts I've seen so far. I've been unable to locate the original authors
of some of these images, so if you know who created them, please let me know!
Sources:
@scharan79 on Reddit
@TalentedJuli on Reddit
@Wild-Culture-5068 on Reddit
@popsicle_pope on Reddit
@gopatrik on Twitter
@ARTiV3RSE on Twitter
@willdepue on Twitter
Various OpenAI employees
Challenges
DALL·E-3 is still very new and there's still a lot of things it struggles with (or maybe I just haven't
figured out how to prompt it correctly yet). Here's some challenges which you might want to try
your hand at:
Web Design
DALL·E really struggles at generating real looking websites, apps, etc. and often generates what
looks like a portfolio page of a web designer. Here's the best I've gotten so far:
Seamless Textures
It feels like DALL·E-3 is so close to being able to generate seamless textures. Often they come
out great, just slightly cutoff or with a few artifacts. See examples below:
Fonts
Using DALL·E to generate custom fonts or iterate on letter designs could be really cool, but I
haven't been able to get it to work yet. Here's the best I've gotten so far:
More Resources
Thanks for reading! If you're looking for more resources on DALL·E-3, here are some related
links:
The new Assistants API is a stateful evolution of our Chat Completions API meant to simplify
the creation of assistant-like experiences, and enable developer access to powerful tools like
Code Interpreter and Retrieval.
The primitives of the Chat Completions API are Messages , on which you perform a Completion
with a Model ( gpt-3.5-turbo , gpt-4 , etc). It is lightweight and powerful, but inherently
stateless, which means you have to manage conversation state, tool definitions, retrieval
documents, and code execution manually.
Assistants , which encapsulate a base model, instructions, tools, and (context) documents,
Threads , which represent the state of a conversation, and
We'll take a look at how these can be used to create powerful, stateful experiences.
Setup
Python SDK
“Note We've updated our Python SDK to add support for the Assistants API, so you'll need
to update it to the latest version ( 1.2.3 at time of writing).”
Version: 1.2.3
import json
def show_json(obj):
display(json.loads(obj.model_dump_json()))
Assistants
The easiest way to get started with the Assistants API is through the Assistants Playground.
Let's begin by creating an assistant! We'll create a Math Tutor just like in our docs.
You can view Assistants you've created in the Assistants Dashboard.
You can also create Assistants directly through the Assistants API, like so:
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
assistant = client.beta.assistants.create(
name="Math Tutor",
instructions="You are a personal math tutor. Answer questions briefly, in a sentence or less.",
model="gpt-4-1106-preview",
)
show_json(assistant)
{'id': 'asst_9HAjl9y41ufsViNcThW1EXUS',
'created_at': 1699828331,
'description': None,
'file_ids': [],
'instructions': 'You are a personal math tutor. Answer questions briefly, in a sentence or les
'metadata': {},
'model': 'gpt-4-1106-preview',
'name': 'Math Tutor',
'object': 'assistant',
'tools': []}
Regardless of whether you create your Assistant through the Dashboard or with the API, you'll
want to keep track of the Assistant ID. This is how you'll refer to your Assistant throughout
Threads and Runs.
Next, we'll create a new Thread and add a Message to it. This will hold the state of our
conversation, so we don't have re-send the entire message history each time.
Threads
thread = client.beta.threads.create()
show_json(thread)
{'id': 'thread_bw42vPoQtYBMQE84WubNcJXG',
'created_at': 1699828331,
'metadata': {},
'object': 'thread'}
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="I need to solve the equation `3x + 11 = 14`. Can you help me?",
)
show_json(message)
{'id': 'msg_IBiZDAWHhWPewxzN0EfTYNew',
'assistant_id': None,
'content': [{'text': {'annotations': [],
'value': 'I need to solve the equation `3x + 11 = 14`. Can you help me?'},
'type': 'text'}],
'created_at': 1699828332,
'file_ids': [],
'metadata': {},
'object': 'thread.message',
'role': 'user',
'run_id': None,
'thread_id': 'thread_bw42vPoQtYBMQE84WubNcJXG'}
“Note Even though you're no longer sending the entire history each time, you will still be
charged for the tokens of the entire conversation history with each Run.”
Runs
Notice how the Thread we created is not associated with the Assistant we created earlier!
Threads exist independently from Assistants, which may be different from what you'd expect if
you've used ChatGPT (where a thread is tied to a model/GPT).
To get a completion from an Assistant for a given Thread, we must create a Run. Creating a Run
will indicate to an Assistant it should look at the messages in the Thread and take action: either
by adding a single response, or using tools.
“Note Runs are a key difference between the Assistants API and Chat Completions API.
While in Chat Completions the model will only ever respond with a single message, in the
Assistants API a Run may result in an Assistant using one or multiple tools, and potentially
adding multiple messages to the Thread.”
To get our Assistant to respond to the user, let's create the Run. As mentioned earlier, you must
specify both the Assistant and the Thread.
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id,
)
show_json(run)
{'id': 'run_LA08RjouV3RemQ78UZXuyzv6',
'assistant_id': 'asst_9HAjl9y41ufsViNcThW1EXUS',
'cancelled_at': None,
'completed_at': None,
'created_at': 1699828332,
'expires_at': 1699828932,
'failed_at': None,
'file_ids': [],
'instructions': 'You are a personal math tutor. Answer questions briefly, in a sentence or les
'last_error': None,
'metadata': {},
'model': 'gpt-4-1106-preview',
'object': 'thread.run',
'required_action': None,
'started_at': None,
'status': 'queued',
'thread_id': 'thread_bw42vPoQtYBMQE84WubNcJXG',
'tools': []}
Unlike creating a completion in the Chat Completions API, creating a Run is an asynchronous
operation. It will return immediately with the Run's metadata, which includes a status that will
initially be set to queued . The status will be updated as the Assistant performs operations
(like using tools and adding messages).
To know when the Assistant has completed processing, we can poll the Run in a loop. (Support
for streaming is coming soon!) While here we are only checking for a queued or in_progress
status, in practice a Run may undergo a variety of status changes which you can choose to
surface to the user. (These are called Steps, and will be covered later.)
import time
{'id': 'run_LA08RjouV3RemQ78UZXuyzv6',
'assistant_id': 'asst_9HAjl9y41ufsViNcThW1EXUS',
'cancelled_at': None,
'completed_at': 1699828333,
'created_at': 1699828332,
'expires_at': None,
'failed_at': None,
'file_ids': [],
'instructions': 'You are a personal math tutor. Answer questions briefly, in a sentence or les
'last_error': None,
'metadata': {},
'model': 'gpt-4-1106-preview',
'object': 'thread.run',
'required_action': None,
'started_at': 1699828332,
'status': 'completed',
'thread_id': 'thread_bw42vPoQtYBMQE84WubNcJXG',
'tools': []}
Messages
Now that the Run has completed, we can list the Messages in the Thread to see what got added
by the Assistant.
messages = client.beta.threads.messages.list(thread_id=thread.id)
show_json(messages)
As you can see, Messages are ordered in reverse-chronological order – this was done so the
most recent results are always on the first page (since results can be paginated). Do keep a
look out for this, since this is the opposite order to messages in the Chat Completions API.
Let's ask our Assistant to explain the result a bit further!
# Retrieve all the messages added after our last user message
messages = client.beta.threads.messages.list(
thread_id=thread.id, order="asc", after=message.id
)
show_json(messages)
This may feel like a lot of steps to get a response back, especially for this simple example.
However, you'll soon see how we can add very powerful functionality to our Assistant without
changing much code at all!
Example
Let's take a look at how we could potentially put all of this together. Below is all the code you
need to use an Assistant you've created.
Since we've already created our Math Assistant, I've saved its ID in MATH_ASSISTANT_ID . I then
defined two functions:
submit_message : create a Message on a Thread, then start (and return) a new Run
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
def get_response(thread):
return client.beta.threads.messages.list(thread_id=thread.id, order="asc")
I've also defined a create_thread_and_run function that I can re-use (which is actually almost
identical to the client.beta.threads.create_and_run compound function in our API ;) ). Finally,
we can submit our mock user requests each to a new Thread.
Notice how all of these API calls are asynchronous operations; this means we actually get async
behavior in our code without the use of async libraries! (e.g. asyncio )
def create_thread_and_run(user_input):
thread = client.beta.threads.create()
run = submit_message(MATH_ASSISTANT_ID, thread, user_input)
return thread, run
import time
# Waiting in a loop
def wait_on_run(run, thread):
while run.status == "queued" or run.status == "in_progress":
run = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id,
)
time.sleep(0.5)
return run
# Messages
user: I need to solve the equation `3x + 11 = 14`. Can you help me?
assistant: Yes, subtract 11 from both sides to get `3x = 3`, then divide both sides by 3 to fin
# Messages
user: Could you explain linear algebra to me?
assistant: Linear algebra is the branch of mathematics that deals with vector spaces, linear eq
# Messages
user: I don't like math. What can I do?
assistant: Try finding aspects of math that relate to your interests or daily life, and conside
# Messages
user: I don't like math. What can I do?
assistant: Try finding aspects of math that relate to your interests or daily life, and conside
user: Thank you!
assistant: You're welcome! If you have any more questions, feel free to ask.
Et voilà!
You may have noticed that this code is not actually specific to our math Assistant at all... this
code will work for any new Assistant you create simply by changing the Assistant ID! That is the
power of the Assistants API.
Tools
A key feature of the Assistants API is the ability to equip our Assistants with Tools, like Code
Interpreter, Retrieval, and custom Functions. Let's take a look at each.
Code Interpreter
Let's equip our Math Tutor with the Code Interpreter tool, which we can do from the
Dashboard...
...or the API, using the Assistant ID.
assistant = client.beta.assistants.update(
MATH_ASSISTANT_ID,
tools=[{"type": "code_interpreter"}],
)
show_json(assistant)
{'id': 'asst_9HAjl9y41ufsViNcThW1EXUS',
'created_at': 1699828331,
'description': None,
'file_ids': [],
'instructions': 'You are a personal math tutor. Answer questions briefly, in a sentence or les
'metadata': {},
'model': 'gpt-4-1106-preview',
'name': 'Math Tutor',
'object': 'assistant',
'tools': [{'type': 'code_interpreter'}]}
# Messages
user: Generate the first 20 fibbonaci numbers with code.
assistant: The first 20 Fibonacci numbers are: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 23
And that's it! The Assistant used Code Interpreter in the background, and gave us a final
response.
For some use cases this may be enough – however, if we want more details on what precisely an
Assistant is doing we can take a look at a Run's Steps.
Steps
A Run is composed of one or more Steps. Like a Run, each Step has a status that you can
query. This is useful for surfacing the progress of a Step to a user (e.g. a spinner while the
Assistant is writing code or performing retrieval).
run_steps = client.beta.threads.runs.steps.list(
thread_id=thread.id, run_id=run.id, order="asc"
)
null
null
2. message_creation
The first Step is a tool_calls , specifically using the code_interpreter which contains:
input , which was the Python code generated before the tool was called, and
output , which was the result of running the Code Interpreter.
The second Step is a message_creation , which contains the message that was added to the
Thread to communicate the results to the user.
Retrieval
Another powerful tool in the Assistants API is Retrieval: the ability to upload files that the
Assistant will use as a knowledge base when answering questions. This can also be enabled
from the Dashboard or the API, where we can upload files we want to be used.
{'id': 'asst_9HAjl9y41ufsViNcThW1EXUS',
'created_at': 1699828331,
'description': None,
'file_ids': ['file-MdXcQI8OdPp76wukWI4dpLwW'],
'instructions': 'You are a personal math tutor. Answer questions briefly, in a sentence or les
'metadata': {},
'model': 'gpt-4-1106-preview',
'name': 'Math Tutor',
'object': 'assistant',
'tools': [{'type': 'code_interpreter'}, {'type': 'retrieval'}]}
# Messages
user: What are some cool math concepts behind this ML paper pdf? Explain in two sentences.
assistant: I am unable to find specific sections referring to "cool math concepts" directly in
assistant: The paper discusses leveraging large language models as a framework for unsupervised
“Note There are more intricacies in Retrieval, like Annotations, which may be covered in
another cookbook.”
Functions
As a final powerful tool for your Assistant, you can specify custom Functions (much like the
Function Calling in the Chat Completions API). During a Run, the Assistant can then indicate it
wants to call one or more functions you specified. You are then responsible for calling the
Function, and providing the output back to the Assistant.
Let's take a look at an example by defining a display_quiz() Function for our Math Tutor.
This function will take a title and an array of question s, display the quiz, and get input from
the user for each:
title
questions
question_text
Unfortunately I don't know how to get user input within a Python Notebook, so I'll be mocking
out responses with get_mock_response... . This is where you'd get the user's actual input.
def get_mock_response_from_user_multiple_choice():
return "a"
def get_mock_response_from_user_free_response():
return "I don't know."
for q in questions:
print(q["question_text"])
response = ""
responses.append(response)
print()
return responses
Now, let's define the interface of this function in JSON format, so our Assistant can call it:
function_json = {
"name": "display_quiz",
"description": "Displays a quiz to the student, and returns the student's response. A single quiz
"parameters": {
"type": "object",
"properties": {
"title": {"type": "string"},
"questions": {
"type": "array",
"description": "An array of questions, each with a title and potentially options (if
"items": {
"type": "object",
"properties": {
"question_text": {"type": "string"},
"question_type": {
"type": "string",
"enum": ["MULTIPLE_CHOICE", "FREE_RESPONSE"],
},
"choices": {"type": "array", "items": {"type": "string"}},
},
"required": ["question_text"],
},
},
},
"required": ["title", "questions"],
},
}
Once again, let's update out Assistant either through the Dashboard or the API.
“Note Pasting the function JSON into the Dashboard was a bit finicky due to indentation,
etc. I just asked ChatGPT to format my function the same as one of the examples on the
Dashboard :).”
assistant = client.beta.assistants.update(
MATH_ASSISTANT_ID,
tools=[
{"type": "code_interpreter"},
{"type": "retrieval"},
{"type": "function", "function": function_json},
],
)
show_json(assistant)
{'id': 'asst_9HAjl9y41ufsViNcThW1EXUS',
'created_at': 1699828331,
'description': None,
'file_ids': ['file-MdXcQI8OdPp76wukWI4dpLwW'],
'instructions': 'You are a personal math tutor. Answer questions briefly, in a sentence or les
'metadata': {},
'model': 'gpt-4-1106-preview',
'name': 'Math Tutor',
'object': 'assistant',
'tools': [{'type': 'code_interpreter'},
{'type': 'retrieval'},
{'function': {'name': 'display_quiz',
'parameters': {'type': 'object',
'properties': {'title': {'type': 'string'},
'questions': {'type': 'array',
'description': 'An array of questions, each with a title and potentially options (if mul
'items': {'type': 'object',
'properties': {'question_text': {'type': 'string'},
'question_type': {'type': 'string',
'enum': ['MULTIPLE_CHOICE', 'FREE_RESPONSE']},
'choices': {'type': 'array', 'items': {'type': 'string'}}},
'required': ['question_text']}}},
'required': ['title', 'questions']},
'description': "Displays a quiz to the student, and returns the student's response. A singl
'type': 'function'}]}
'requires_action'
Now, however, when we check the Run's status we see requires_action ! Let's take a closer.
show_json(run)
{'id': 'run_98PGE3qGtHoaWaCLoytyRUBf',
'assistant_id': 'asst_9HAjl9y41ufsViNcThW1EXUS',
'cancelled_at': None,
'completed_at': None,
'created_at': 1699828370,
'expires_at': 1699828970,
'failed_at': None,
'file_ids': ['file-MdXcQI8OdPp76wukWI4dpLwW'],
'instructions': 'You are a personal math tutor. Answer questions briefly, in a sentence or les
'last_error': None,
'metadata': {},
'model': 'gpt-4-1106-preview',
'object': 'thread.run',
'required_action': {'submit_tool_outputs': {'tool_calls': [{'id': 'call_Zf650sWT1wW4Uwbf5YeDS0
'function': {'arguments': '{\n "title": "Mathematics Quiz",\n "questions": [\n {\n
'name': 'display_quiz'},
'type': 'function'}]},
'type': 'submit_tool_outputs'},
'started_at': 1699828370,
'status': 'requires_action',
'thread_id': 'thread_bICTESFvWoRdj0O0SzsosLCS',
'tools': [{'type': 'code_interpreter'},
{'type': 'retrieval'},
{'function': {'name': 'display_quiz',
'parameters': {'type': 'object',
'properties': {'title': {'type': 'string'},
'questions': {'type': 'array',
'description': 'An array of questions, each with a title and potentially options (if mul
The required_action field indicates a Tool is waiting for us to run it and submit its output back
to the Assistant. Specifically, the display_quiz function! Let's start by parsing the name and
arguments .
“Note While in this case we know there is only one Tool call, in practice the Assistant may
choose to call multiple tools.”
Explain why the square root of a negative number is not a real number.
Great! (Remember these responses are the one's we mocked earlier. In reality, we'd be getting
input from the back from this function call.)
Now that we have our responses, let's submit them back to the Assistant. We'll need the
tool_call ID, found in the tool_call we parsed out earlier. We'll also need to encode our
run = client.beta.threads.runs.submit_tool_outputs(
thread_id=thread.id,
run_id=run.id,
tool_outputs=[
{
"tool_call_id": tool_call.id,
"output": json.dumps(responses),
}
],
)
show_json(run)
{'id': 'run_98PGE3qGtHoaWaCLoytyRUBf',
'assistant_id': 'asst_9HAjl9y41ufsViNcThW1EXUS',
'cancelled_at': None,
'completed_at': None,
'created_at': 1699828370,
'expires_at': 1699828970,
'failed_at': None,
'file_ids': ['file-MdXcQI8OdPp76wukWI4dpLwW'],
'instructions': 'You are a personal math tutor. Answer questions briefly, in a sentence or les
'last_error': None,
'metadata': {},
'model': 'gpt-4-1106-preview',
'object': 'thread.run',
'required_action': None,
'started_at': 1699828370,
'status': 'queued',
'thread_id': 'thread_bICTESFvWoRdj0O0SzsosLCS',
'tools': [{'type': 'code_interpreter'},
{'type': 'retrieval'},
{'function': {'name': 'display_quiz',
'parameters': {'type': 'object',
'properties': {'title': {'type': 'string'},
'questions': {'type': 'array',
'description': 'An array of questions, each with a title and potentially options (if mul
'items': {'type': 'object',
'properties': {'question_text': {'type': 'string'},
'question_type': {'type': 'string',
'enum': ['MULTIPLE_CHOICE', 'FREE_RESPONSE']},
We can now wait for the Run to complete once again, and check our Thread!
# Messages
user: Make a quiz with 2 questions: One open ended, one multiple choice. Then, give me feedback
assistant: Thank you for attempting the quiz.
For the first question, it's important to know that the square root of a negative number is not
For the second question, the correct answer is "108 degrees." In a regular pentagon, which is a
Woohoo 🎉
Conclusion
We covered a lot of ground in this notebook, give yourself a high-five! Hopefully you should
now have a strong foundation to build powerful, stateful experiences with tools like Code
Interpreter, Retrieval, and Functions!
There's a few sections we didn't cover for the sake of brevity, so here's a few resources to
explore further:
Annotations: parsing file citations
Multi-Assistant Thread Runs: single Thread with Messages from multiple Assistants
This notebook shows how Ada embeddings can be used to implement semantic code search.
For this demonstration, we use our own openai-python code repository. We implement a
simple version of file parsing and extracting of functions from python files, which can be
embedded, indexed, and queried.
Helper Functions
We first setup some simple parsing functions that allow us to extract important information
from our codebase.
import pandas as pd
from pathlib import Path
def get_function_name(code):
"""
Extract function name from a line beginning with 'def' or 'async def'.
"""
for prefix in DEF_PREFIXES:
if code.startswith(prefix):
return code[len(prefix): code.index('(')]
def extract_functions_from_repo(code_root):
"""
Extract all .py functions from the repository.
"""
code_files = list(code_root.glob('**/*.py'))
num_files = len(code_files)
print(f'Total number of .py files: {num_files}')
if num_files == 0:
print('Verify openai-python repo exists and code_root is set correctly.')
return None
all_funcs = [
func
for code_file in code_files
for func in get_functions(str(code_file))
]
num_funcs = len(all_funcs)
print(f'Total number of functions extracted: {num_funcs}')
return all_funcs
Data Loading
We'll first load the openai-python folder and extract the needed information using the functions
we defined above.
Now that we have our content, we can pass the data to the text-embedding-3-small model
and get back our vector embeddings.
df = pd.DataFrame(all_funcs)
df['code_embedding'] = df['code'].apply(lambda x: get_embedding(x, model='text-embedding-3-small'))
df['filepath'] = df['filepath'].map(lambda x: Path(x).relative_to(code_root))
df.to_csv("data/code_search_openai-python.csv", index=False)
df.head()
Testing
Let's test our endpoint with some simple queries. If you're familiar with the openai-python
repository, you'll see that we're able to easily find functions we're looking for only a simple
English description.
We define a search_functions method that takes our data that contains our embeddings, a
query string, and some other configuration options. The process of searching our database
works like such:
1. We first embed our query string (code_query) with text-embedding-3-small . The reasoning
here is that a query string like 'a function that reverses a string' and a function like 'def
reverse(string): return string[::-1]' will be very similar when embedded.
2. We then calculate the cosine similarity between our query string embedding and all data
points in our database. This gives a distance between each point and our query.
3. We finally sort all of our data points by their distance to our query string and return the
number of results requested in the function parameters.
if pprint:
for r in res.iterrows():
print(f"{r[1].filepath}:{r[1].function_name} score={round(r[1].similarities, 3)}")
print("\n".join(r[1].code.split("\n")[:n_lines]))
print('-' * 70)
return res
openai/validators.py:format_inferrer_validator score=0.453
def format_inferrer_validator(df):
"""
This validator will infer the likely fine-tuning format of the data, and display it to the
It will also suggest to use ada and explain train/validation split benefits.
"""
ft_type = infer_task_type(df)
immediate_msg = None
----------------------------------------------------------------------
openai/validators.py:infer_task_type score=0.37
def infer_task_type(df):
"""
Infer the likely fine-tuning task type from the data
"""
CLASSIFICATION_THRESHOLD = 3 # min_average instances of each class
if sum(df.prompt.str.len()) == 0:
return "open-ended generation"
----------------------------------------------------------------------
openai/validators.py:apply_validators score=0.369
def apply_validators(
df,
fname,
remediation,
validators,
auto_accept,
write_out_file_func,
----------------------------------------------------------------------
openai/validators.py:get_common_xfix score=0.487
def get_common_xfix(series, xfix="suffix"):
"""
Finds the longest common suffix or prefix of all the values in a series
"""
common_xfix = ""
while True:
common_xfixes = (
series.str[-(len(common_xfix) + 1) :]
if xfix == "suffix"
else series.str[: len(common_xfix) + 1]
----------------------------------------------------------------------
openai/validators.py:common_completion_suffix_validator score=0.449
def common_completion_suffix_validator(df):
"""
This validator will suggest to add a common suffix to the completion if one doesn't already
"""
error_msg = None
immediate_msg = None
optional_msg = None
optional_fn = None
ft_type = infer_task_type(df)
----------------------------------------------------------------------
openai/cli.py:tools_register score=0.391
def tools_register(parser):
subparsers = parser.add_subparsers(
title="Tools", help="Convenience client side tools"
)
def help(args):
parser.print_help()
parser.set_defaults(func=help)
sub = subparsers.add_parser("fine_tunes.prepare_data")
sub.add_argument(
"-f",
"--file",
required=True,
help="JSONL, JSON, CSV, TSV, TXT or XLSX file containing prompt-completion examples to
"This should be the local file path.",
)
sub.add_argument(
"-q",
----------------------------------------------------------------------
Cookbook About API Docs Contribute
By default, when you request a completion from the OpenAI, the entire completion is generated
before being sent back in a single response.
If you're generating long completions, waiting for the response can take many seconds.
To get responses sooner, you can 'stream' the completion as it's being generated. This allows
you to start printing or processing the beginning of the completion before the full completion is
finished.
To stream completions, set stream=True when calling the chat completions or completions
endpoints. This will return an object that streams back the response as data-only server-sent
events. Extract chunks from the delta field rather than the message field.
Downsides
Note that using stream=True in a production application makes it more difficult to moderate
the content of the completions, as partial completions may be more difficult to evaluate. This
may have implications for approved usage.
Another small drawback of streaming responses is that the response no longer includes the
usage field to tell you how many tokens were consumed. After receiving and combining all of
the responses, you can calculate this yourself using tiktoken .
Example code
Below, this notebook shows:
# imports
import time # for measuring time duration of API calls
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
With a typical ChatCompletions API call, the response is first computed and then returned all at
once.
reply = response.choices[0].message
print(f"Extracted reply: \n{reply}")
reply_content = response.choices[0].message.content
print(f"Extracted content: \n{reply_content}")
Extracted reply:
ChatCompletionMessage(content='1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 1
Extracted content:
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26,
With a streaming API call, the response is sent back incrementally in chunks via an event
stream. In Python, you can iterate over these events with a for loop.
# a ChatCompletion request
response = client.chat.completions.create(
model='gpt-3.5-turbo',
messages=[
{'role': 'user', 'content': "What's 1+1? Answer in one word."}
],
temperature=0,
stream=True # this time, we set stream=True
)
****************
ChatCompletionChunk(id='chatcmpl-8ZB9m2Ubv8FJs3CIb84WvYwqZCHST', choices=[Choice(delta=ChoiceDe
2
****************
ChatCompletionChunk(id='chatcmpl-8ZB9m2Ubv8FJs3CIb84WvYwqZCHST', choices=[Choice(delta=ChoiceDe
None
****************
As you can see above, streaming responses have a delta field rather than a message field.
delta can hold things like:
Time comparison
In the example above, both requests took about 4 to 5 seconds to fully complete. Request times
will vary depending on load and other stochastic factors.
However, with the streaming request, we received the first token after 0.1 seconds, and
subsequent tokens every ~0.01-0.02 seconds.
Cookbook About API Docs Contribute
This notebook expands on the other Redis OpenAI-cookbook examples with examples of how
to use JSON with vectors. Storing Vectors in JSON
Prerequisites
Redis instance with the Redis Search and Redis JSON modules
Installation
OPENAI_API_KEY=your_key
import openai
import os
from dotenv import load_dotenv
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
Japan's economy teetered on the brink of a technical recession in the three months to September, figu
Revised figures indicated growth of just 0.1% - and a similar-sized contraction in the previous quart
The government was keen to play down the worrying implications of the data. "I maintain the view that
"""
Ethiopia's Tirunesh Dibaba set a new world record in winning the women's 5,000m at the Boston Indoor
Dibaba won in 14 minutes 32.93 seconds to erase the previous world indoor mark of 14:39.29 set by ano
"""
Search engine firm Google has released a trial tool which is concerning some net users because it dir
The AutoLink feature comes with Google's latest toolbar and provides links in a webpage to Amazon.com
AutoLink works by creating a link to a website based on information contained in a webpage - even if
If a user clicks the AutoLink feature in the Google toolbar then a webpage with a book's unique ISBN
The new tool has been compared to the Smart Tags feature from Microsoft by some users. It was widely
"""
! docker compose up -d
REDIS_URL = 'redis://localhost:6379'
client = from_url(REDIS_URL)
client.ping()
True
Create Index
FT.CREATE
schema = [ VectorField('$.vector',
"FLAT",
{ "TYPE": 'FLOAT32',
"DIM": len(doc_1['vector']),
"DISTANCE_METRIC": "COSINE"
}, as_name='vector' ),
TextField('$.content', as_name='content')
]
idx_def = IndexDefinition(index_type=IndexType.JSON, prefix=['doc:'])
try:
client.ft('idx').dropindex()
except:
pass
client.ft('idx').create_index(schema, definition=idx_def)
b'OK'
Load Data into Redis as JSON objects
Redis JSON
True
Semantic Search
Given a sports-related article, search Redis via Vector Similarity Search (VSS) for similar articles.
KNN Search
Paula Radcliffe has been granted extra time to decide whether to compete in the World Cross-Country C
The 31-year-old is concerned the event, which starts on 19 March in France, could upset her preparati
"""
Ethiopia's Tirunesh Dibaba set a new world record in winning the women's 5,000m at the Boston I
Dibaba won in 14 minutes 32.93 seconds to erase the previous world indoor mark of 14:39.29 set
Revised figures indicated growth of just 0.1% - and a similar-sized contraction in the previous
The government was keen to play down the worrying implications of the data. "I maintain the vie
Search engine firm Google has released a trial tool which is concerning some net users because
The AutoLink feature comes with Google's latest toolbar and provides links in a webpage to Amaz
If a user clicks the AutoLink feature in the Google toolbar then a webpage with a book's unique
The new tool has been compared to the Smart Tags feature from Microsoft by some users. It was w
Hybrid Search
Use a combination of full text search and VSS to find a matching article. For this scenario, we
filter on a full text search of the term 'recession' and then find the KNN articles. In this case,
business-related. Reminder document #1 was about a recession in Japan. Hybrid Queries
Ethiopia produced 14.27 million tonnes of crops in 2004, 24% higher than in 2003 and 21% more than th
In 2003, crop production totalled 11.49 million tonnes, the joint report from the Food and Agricultur
The report calculated emergency food requirements for 2005 to be 387,500 tonnes. On top of that, 89,0
In eastern and southern Ethiopia, a prolonged drought has killed crops and drained wells. Last year,
"""
Japan's economy teetered on the brink of a technical recession in the three months to September
Revised figures indicated growth of just 0.1% - and a similar-sized contraction in the previous
The government was keen to play down the worrying implications of the data. "I maintain the vie
Cookbook About API Docs Contribute
This notebook takes you through a simple flow to download some data, embed it, and then
index and search it using a selection of vector databases. This is a common requirement for
customers who want to store and search our embeddings with their own data in a secure
environment to support production use cases such as chatbots, topic modelling and more.
Vector databases enable enterprises to take many of the embeddings use cases we've shared in
this repo (question and answering, chatbot and recommendation services, for example), and
make use of them in a secure, scalable environment. Many of our customers make embeddings
solve their problems at small scale but performance and security hold them back from going
into production - we see vector databases as a key component in solving that, and in this guide
we'll walk through the basics of embedding text data, storing it in a vector database and using it
for semantic search.
Demo Flow
The demo flow is:
Setup: Import packages and set any required variables
Redis
Index Data: Create the search index for vector search and hybrid search (vector + full-
text search) on all available fields.
Search Data: Run a few example queries with various goals in mind.
Once you've run through this notebook you should have a basic understanding of how to setup
and use vector databases, and can move on to more complex use cases making use of our
embeddings.
Setup
Import the required libraries and set the embedding model that we'd like to use.
import openai
# I've set this to our new embeddings model, this can be changed to the embedding model of your choic
EMBEDDING_MODEL = "text-embedding-3-small"
# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings
warnings.filterwarnings(action="ignore", message="unclosed", category=ResourceWarning)
warnings.filterwarnings("ignore", category=DeprecationWarning)
Load data
In this section we'll load embedded data that we've prepared previous to this session.
embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
import zipfile
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip","r") as zip_ref:
zip_ref.extractall("../data")
article_df = pd.read_csv('../data/vector_database_wikipedia_articles_embedded.csv')
article_df.head()
article_df.info(show_counts=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 25000 non-null int64
1 url 25000 non-null object
2 title 25000 non-null object
3 text 25000 non-null object
4 title_vector 25000 non-null object
5 content_vector 25000 non-null object
6 vector_id 25000 non-null object
dtypes: int64(1), object(6)
memory usage: 1.3+ MB
Redis
The next vector database covered in this tutorial is Redis. You most likely already know Redis.
What you might not be aware of is the RediSearch module. Enterprises have been using Redis
with the RediSearch module for years now across all major cloud providers, Redis Cloud, and on
premise. Recently, the Redis team added vector storage and search capability to this module in
addition to the features RediSearch already had.
Given the large ecosystem around Redis, there are most likely client libraries in the language
you need. You can use any standard Redis client library to run RediSearch commands, but it's
easiest to use a library that wraps the RediSearch API. Below are a few examples, but you can
find more client libraries here.
Project Language License Author Stars
In the below cells, we will walk you through using Redis as a vector database. Since many of you
are likely already used to the Redis API, this should be familiar to most.
Setup
There are many ways to deploy Redis with RediSearch. The easiest way to get started is to use
Docker, but there are are many potential options for deployment. For other deployment
options, see the redis directory in this repo.
Start a version of Redis with RediSearch (Redis Stack) by running the following docker command
$ cd redis
$ docker compose up -d
This also includes the RedisInsight GUI for managing your Redis database which you can view
at http://localhost:8001 once you start the docker container.
You're all set up and ready to go! Next, we import and create our client for communicating with
the Redis database we just created.
import redis
from redis.commands.search.indexDefinition import (
IndexDefinition,
IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
TextField,
VectorField
)
REDIS_HOST = "localhost"
REDIS_PORT = 6379
REDIS_PASSWORD = "" # default for passwordless Redis
# Connect to Redis
redis_client = redis.Redis(
host=REDIS_HOST,
port=REDIS_PORT,
password=REDIS_PASSWORD
)
redis_client.ping()
True
The below cells will show how to specify and create a search index in Redis. We will
1. Set some constants for defining our index like the distance metric and the index name
Now that we have a search index, we can load documents into it. We will use the same
documents we used in the previous examples. In Redis, either the Hash or JSON (if using
RedisJSON in addition to RediSearch) data types can be used to store documents. We will use
the HASH data type in this example. The below cells will show how to load documents into the
index.
Now that we have a search index and documents loaded into it, we can run search queries.
Below we will provide a function that will run a search query and return the results. Using this
function we run a few queries that will show how you can utilize Redis as a vector database.
Each example will demonstrate specific features to keep in mind when developing your search
application with Redis.
1. Return Fields: You can specify which fields you want to return in the search results. This is
useful if you only want to return a subset of the fields in your documents and doesn't
require a separate call to retrieve documents. In the below example, we will only return the
title field in the search results.
2. Hybrid Search: You can combine vector search with any of the other RediSearch fields for
hybrid search such as full text search, tag, geo, and numeric. In the below example, we will
combine vector search with full text search.
def search_redis(
redis_client: redis.Redis,
user_query: str,
index_name: str = "embeddings-index",
vector_field: str = "title_vector",
return_fields: list = ["title", "url", "text", "vector_score"],
hybrid_fields = "*",
k: int = 20,
) -> List[dict]:
The previous examples showed how run vector search queries with RediSearch. In this section,
we will show how to combine vector search with other RediSearch fields for hybrid search. In the
below example, we will combine vector search with full text search.
# search the content vector for articles about famous battles in Scottish history and only include re
results = search_redis(redis_client,
"Famous battles in Scottish history",
vector_field="title_vector",
k=5,
hybrid_fields=create_hybrid_field("title", "Scottish")
)
# run a hybrid query for articles about Art in the title vector and only include results with the phr
results = search_redis(redis_client,
"Art",
vector_field="title_vector",
k=5,
hybrid_fields=create_hybrid_field("text", "Leonardo da Vinci")
)
# find specific mention of Leonardo da Vinci in the text that our full-text-search query returned
mention = [sentence for sentence in results[0].text.split("\n") if "Leonardo da Vinci" in sentence][0
mention
'In Europe, after the Middle Ages, there was a "Renaissance" which means "rebirth". People redi
For more example with Redis as a vector database, see the README and examples within the
vector_databases/redis directory of this repository
Cookbook About API Docs Contribute
This code demonstrates how to interact with ChatGPT functions to perform tasks related to
Amazon S3 buckets. The notebook covers S3 bucket key functionalities such as running simple
listing commands, searching for a specific file in all buckets, uploading a file to a bucket, and
downloading a file from a bucket. The OpenAI Chat API understands the user instructions,
generates the natural language responses, and extracts appropriate function calls based on the
user's input.
Requirements: To run the notebook generate AWS access key with S3 bucket writing permission
and store them in a local environment file alongside the Openai key. The " .env " file format:
AWS_ACCESS_KEY_ID=<your-key>
AWS_SECRET_ACCESS_KEY=<your-key>
OPENAI_API_KEY=<your-key>
True
Initials
OpenAI.api_key = os.environ.get("OPENAI_API_KEY")
GPT_MODEL = "gpt-3.5-turbo"
# Optional - if you had issues loading the environment file, you can set the AWS values using the bel
# os.environ['AWS_ACCESS_KEY_ID'] = ''
# os.environ['AWS_SECRET_ACCESS_KEY'] = ''
# Create S3 client
s3_client = boto3.client('s3')
Utilities
Create helper functions to interact with the S3 service, such as listing buckets, listing objects,
downloading and uploading files, and searching for specific files.
def datetime_converter(obj):
if isinstance(obj, datetime.datetime):
return obj.isoformat()
raise TypeError(f"Object of type {obj.__class__.__name__} is not JSON serializable")
def list_buckets():
response = s3_client.list_buckets()
return json.dumps(response['Buckets'], default=datetime_converter)
filename = os.path.basename(key)
if bucket is None:
buckets_response = json.loads(list_buckets())
buckets = [bucket_info["Name"] for bucket_info in buckets_response]
else:
buckets = [bucket]
results = []
if bucket_results:
results.extend([{"Bucket": bucket_name, "Object": obj} for obj in bucket_results])
return json.dumps(results)
The below dictionary connects the name with the function to use it for execution based on
ChatGPT responses.
available_functions = {
"list_buckets": list_buckets,
"list_objects": list_objects,
"download_file": download_file,
"upload_file": upload_file,
"search_s3_objects": search_s3_objects
}
ChatGPT
Conversation flow
Create a main function for the chatbot, which takes user input, sends it to the OpenAI Chat API,
receives a response, executes any function calls generated by the API, and returns a final
response to the user.
system_message=f"Don't make assumptions about what values to plug into functions. Ask for clarifi
messages = [{"role": "system", "content": system_message},
{"role": "user", "content": user_input}]
if is_log:
print(response.choices)
return final_message
In the following examples, make sure to replace the placeholders such as <file_name> ,
<bucket_name> , and <directory_path> with your specific values before execution.
print(run_conversation('list my S3 buckets'))
You can ask the assistant to search for a specific file name either in all the buckets or in a
specific one.
search_file = '<file_name>'
print(run_conversation(f'search for a file {search_file} in all buckets'))
search_word = '<file_name_part>'
bucket_name = '<bucket_name>'
print(run_conversation(f'search for a file contains {search_word} in {bucket_name}'))
The model is expected to clarify the ask from the user in case of ambiguity in the parameters
values as described in the system message.
Sure, to help me find what you're looking for, could you please provide the name of the file yo
We also instructed the model to reject irrelevant tasks. Let's test it out and see how it works in
action.
# the model should not answer details not related to the scope
print(run_conversation('what is the weather today'))
Apologies for the misunderstanding, but I am only able to assist with S3 bucket functions. Can
The provided functions are not limited to just retrieving information. They can also assist the
user in uploading or downloading files.
Download a file
search_file = '<file_name>'
bucket_name = '<bucket_name>'
local_directory = '<directory_path>'
print(run_conversation(f'download {search_file} from {bucket_name} bucket to {local_directory} direct
Upload a file
local_file = '<file_name>'
bucket_name = '<bucket_name>'
print(run_conversation(f'upload {local_file} to {bucket_name} bucket'))
Cookbook About API Docs Contribute
CQL Version
In this quickstart you will learn how to build a "philosophy quote finder & generator" using
OpenAI's vector embeddings and Apache Cassandra®, or equivalently DataStax Astra DB
through CQL, as the vector store for data persistence.
The basic workflow of this notebook is outlined below. You will evaluate and store the vector
embeddings for a number of quotes by famous philosophers, use them to build a powerful
search engine and, after that, even a generator of new quotes!
The notebook exemplifies some of the standard usage patterns of vector search -- while
showing how easy is it to get started with the vector capabilities of Cassandra / Astra DB
through CQL.
For a background on using vector search and text embeddings to build a question-answering
system, please check out this excellent hands-on notebook: Question answering using
embeddings.
Choose-your-framework
Please note that this notebook uses the Cassandra drivers and runs CQL (Cassandra Query
Language) statements directly, but we cover other choices of technology to accomplish the
same task. Check out this folder's README for other options. This notebook can run either as a
Colab notebook or as a regular Jupyter notebook.
Table of contents:
Setup
Get DB connection
Connect to OpenAI
How it works
Indexing
Each quote is made into an embedding vector with OpenAI's Embedding . These are saved in the
Vector Store for later use in searching. Some metadata, including the author's name and a few
other pre-computed tags, are stored alongside, to allow for search customization.
Search
To find a quote similar to the provided search quote, the latter is made into an embedding
vector on the fly, and this vector is used to query the store for similar vectors ... i.e. similar
quotes that were previously indexed. The search can optionally be constrained by additional
metadata ("find me quotes by Spinoza similar to this one ...").
The key point here is that "quotes similar in content" translates, in vector space, to vectors that
are metrically close to each other: thus, vector similarity search effectively implements semantic
similarity. This is the key reason vector embeddings are so powerful.
The sketch below tries to convey this idea. Each quote, once it's made into a vector, is a point in
space. Well, in this case it's on a sphere, since OpenAI's embedding vectors, as most others, are
normalized to unit length. Oh, and the sphere is actually not three-dimensional, rather 1536-
dimensional!
So, in essence, a similarity search in vector space returns the vectors that are closest to the
query vector:
Generation
Given a suggestion (a topic or a tentative quote), the search step is performed, and the first
returned results (quotes) are fed into an LLM prompt which asks the generative model to invent
a new text along the lines of the passed examples and the initial suggestion.
Setup
import os
from uuid import uuid4
from getpass import getpass
from collections import Counter
import openai
from datasets import load_dataset
Don't mind the next cell too much, we need it to detect Colabs and let you upload the SCB file (see
below):
try:
from google.colab import files
IS_COLAB = True
except ModuleNotFoundError:
IS_COLAB = False
Get DB connection
A couple of secrets are required to create a Session object (a connection to your Astra DB
instance).
(Note: some steps will be slightly different on Google Colab and on local Jupyter, that's why the
notebook will detect the runtime type.)
Please provide the full path to your Secure Connect Bundle zipfile: /path/to/secure-connect-Da
Please provide your Database Token ('AstraCS:...' string): ········
Please provide the Keyspace name for your Database: my_keyspace
(Incidentally, you could also use any Cassandra cluster (as long as it provides Vector capabilities),
just by changing the parameters to the following Cluster instantiation.)
# Don't mind the "Closing connection" error after "downgrading protocol..." messages you may see,
# it is really just a warning: the connection will work smoothly.
cluster = Cluster(
cloud={
"secure_connect_bundle": ASTRA_DB_SECURE_BUNDLE_PATH,
},
auth_provider=PlainTextAuthProvider(
"token",
ASTRA_DB_APPLICATION_TOKEN,
),
)
session = cluster.connect()
keyspace = ASTRA_DB_KEYSPACE
Each row will store: a quote, its vector embedding, the quote author and a set of "tags". You
also need a primary key to ensure uniqueness of rows.
The following is the full CQL command that creates the table (check out this page for more on
the CQL syntax of this and the following statements):
session.execute(create_table_statement)
<cassandra.cluster.ResultSet at 0x7feee37b3460>
In order to run ANN (approximate-nearest-neighbor) searches on the vectors in the table, you
need to create a specific index on the embedding_vector column.
When creating the index, you can optionally choose the "similarity function" used to compute
vector distances: since for unit-length vectors (such as those from OpenAI) the "cosine difference"
is the same as the "dot product", you'll use the latter which is computationally less expensive.
session.execute(create_vector_index_statement)
<cassandra.cluster.ResultSet at 0x7feeefd3da00>
Add indexes for author and tag filtering
That is enough to run vector searches on the table ... but you want to be able to optionally
specify an author and/or some tags to restrict the quote search. Create two other indexes to
support this:
<cassandra.cluster.ResultSet at 0x7fef2c64af70>
Connect to OpenAI
client = openai.OpenAI(api_key=OPENAI_API_KEY)
embedding_model_name = "text-embedding-3-small"
result = client.embeddings.create(
input=[
"This is a sentence",
"A second sentence"
],
model=embedding_model_name,
)
Note: the above is the syntax for OpenAI v1.0+. If using previous versions, the code to get the
embeddings will look different.
print(f"len(result.data) = {len(result.data)}")
print(f"result.data[1].embedding = {str(result.data[1].embedding)[:55]}...")
print(f"len(result.data[1].embedding) = {len(result.data[1].embedding)}")
len(result.data) = 2
result.data[1].embedding = [-0.0108176339417696, 0.0013546717818826437, 0.00362232...
len(result.data[1].embedding) = 1536
Get a dataset with the quotes. (We adapted and augmented the data from this Kaggle dataset,
ready to use in this demo.)
philo_dataset = load_dataset("datastax/philosopher-quotes")["train"]
A quick inspection:
An example entry:
{'author': 'aristotle', 'quote': 'Love well, be loved and do something of value.', 'tags': 'lov
You will compute the embeddings for the quotes and save them into the Vector Store, along
with the text itself and the metadata planned for later use.
To optimize speed and reduce the calls, you'll perform batched calls to the embedding OpenAI
service.
The DB write is accomplished with a CQL statement. But since you'll run this particular insertion
several times (albeit with different values), it's best to prepare the statement and then just run it
over and over.
(Note: for faster insertion, the Cassandra drivers would let you do concurrent inserts, which we
don't do here for a more straightforward demo code.)
prepared_insertion = session.prepare(
f"INSERT INTO {keyspace}.philosophers_cql (quote_id, author, body, embedding_vector, tags) VALUES
)
BATCH_SIZE = 20
quotes_list = philo_dataset["quote"]
authors_list = philo_dataset["author"]
tags_list = philo_dataset["tags"]
[('Life to the great majority is only a constant struggle for mere existence, with the certaint
'schopenhauer'),
('We give up leisure in order that we may have leisure, just as we go to war in order that we
'aristotle'),
('Perhaps the gods are kind to us, by making life more disagreeable as we grow older. In the e
'freud')]
Search constrained to a tag (out of those saved earlier with the quotes):
[('Mankind will never see an end of trouble until lovers of wisdom come to hold political power
'plato'),
('Everything the State says is a lie, and everything it has it has stolen.',
'nietzsche')]
To keep this issue under control, you can get the actual "similarity" between the query and each
result, and then set a cutoff on it, effectively discarding results that are beyond that threshold.
Tuning this threshold correctly is not an easy problem: here, we'll just show you the way.
To get a feeling on how this works, try the following query and play with the choice of quote
and threshold to compare the results:
Note (for the mathematically inclined): this value is a rescaling between zero and one of the
cosine difference between the vectors, i.e. of the scalar product divided by the product of the norms
of the two vectors. In other words, this is 0 for opposite-facing vecors and +1 for parallel vectors.
For other measures of similarity, check the documentation -- and keep in mind that the metric in
the SELECT query should match the one used when creating the index earlier for meaningful,
ordered results.
similarity_threshold = 0.92
quote_vector = client.embeddings.create(
input=[quote],
model=embedding_model_name,
).data[0].embedding
# Once more: remember to prepare your statements in production for greater performance...
You also need a template for the prompt that will be filled for the generate-quote LLM
completion task.
completion_model_name = "gpt-3.5-turbo"
ACTUAL EXAMPLES:
{examples}
"""
Like for search, this functionality is best wrapped into a handy function (which internally uses
search):
Note: similar to the case of the embedding computation, the code for the Chat Completion API
would be slightly different for OpenAI prior to v1.0.
Putting quote generation to test
Just passing a text (a "quote", but one can actually just suggest a topic since its vector
embedding will still end up at the right place in the vector space):
** quotes found:
** - Happiness is the reward of virtue. (aristotle)
** - Our moral virtues benefit mainly other people; intellectual virtues, on the other hand,
** end of logging
** quotes found:
** - Because Christian morality leaves animals out of account, they are at once outlawed in
** - The assumption that animals are without rights, and the illusion that our treatment of
** end of logging
(Optional) Partitioning
There's an interesting topic to examine before completing this quickstart. While, generally, tags
and quotes can be in any relationship (e.g. a quote having multiple tags), authors are effectively
an exact grouping (they define a "disjoint partitioning" on the set of quotes): each quote has
exactly one author (for us, at least).
Now, suppose you know in advance your application will usually (or always) run queries on a
single author. Then you can take full advantage of the underlying database structure: if you
group quotes in partitions (one per author), vector queries on just an author will use less
resources and return much faster.
We'll not dive into the details here, which have to do with the Cassandra storage internals: the
important message is that if your queries are run within a group, consider partitioning
accordingly to boost performance.
The partitioning per author calls for a new table schema: create a new table called
"philosophers_cql_partitioned", along with the necessary indexes:
session.execute(create_table_p_statement)
session.execute(create_vector_index_p_statement)
<cassandra.cluster.ResultSet at 0x7fef149d7940>
However, by way of demonstration, you will take advantage of a handy facility offered by the
Cassandra drivers to easily run several queries (in this case, INSERT s) concurrently. This is
something that Cassandra / Astra DB through CQL supports very well and can lead to a
significant speedup, with very little changes in the client code.
(Note: one could additionally have cached the embeddings computed previously to save a few API
tokens -- here, however, we wanted to keep the code easier to inspect.)
prepared_insertion = session.prepare(
f"INSERT INTO {keyspace}.philosophers_cql_partitioned (quote_id, author, body, embedding_vector,
)
BATCH_SIZE = 50
quotes_list = philo_dataset["quote"]
authors_list = philo_dataset["author"]
tags_list = philo_dataset["tags"]
Despite the different table schema, the DB query behind the similarity search is essentially the
same:
That's it: the new table still supports the "generic" similarity searches all right ...
[('Life to the great majority is only a constant struggle for mere existence, with the certaint
'schopenhauer'),
('We give up leisure in order that we may have leisure, just as we go to war in order that we
'aristotle'),
('Perhaps the gods are kind to us, by making life more disagreeable as we grow older. In the e
'freud')]
... but it's when an author is specified that you would notice a huge performance advantage:
Well, you would notice a performance gain, if you had a realistic-size dataset. In this demo, with
a few tens of entries, there's no noticeable difference -- but you get the idea.
Conclusion
Congratulations! You have learned how to use OpenAI for vector embeddings and Astra DB /
Cassandra for storage in order to build a sophisticated philosophical search engine and quote
generator.
This example used the Cassandra drivers and runs CQL (Cassandra Query Language) statements
directly to interface with the Vector Store - but this is not the only choice. Check the README
for other options and integration with popular frameworks.
To find out more on how Astra DB's Vector Search capabilities can be a key ingredient in your
ML/GenAI applications, visit Astra DB's web page on the topic.
Cleanup
If you want to remove all resources used for this demo, run this cell (warning: this will delete the
tables and the data inserted in them!):
<cassandra.cluster.ResultSet at 0x7fef149096a0>
Cookbook About API Docs Contribute
This notebook offers a guide to improve the Whisper's transcriptions. We'll streamline your
audio data via trimming and segmentation, enhancing Whisper's transcription quality. After
transcriptions, we'll refine the output by adding punctuation, adjusting product terminology
(e.g., 'five two nine' to '529'), and mitigating Unicode issues. These strategies will help improve
the clarity of your transcriptions, but remember, customization based on your unique use-case
may be beneficial.
Setup
PyDub is a simple and easy-to-use Python library for audio processing tasks such as slicing,
concatenating, and exporting audio files.
The Audio class from the IPython.display module allows you to create an audio control
that can play sound in Jupyter notebooks, providing a straightforward way to play audio
data directly in your notebook.
For our audio file, we'll use a fictional earnings call written by ChatGPT and read aloud by
the author.This audio file is relatively short, but hopefully provides you with an illustrative
idea of how these pre and post processing steps can be applied to any audio file.
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
At times, files with long silences at the beginning can cause Whisper to transcribe the audio
incorrectly. We'll use Pydub to detect and trim the silence.
Here, we've set the decibel threshold of 20. You can change this if you would like.
return trim_ms
def trim_start(filepath):
path = Path(filepath)
directory = path.parent
filename = path.name
audio = AudioSegment.from_file(filepath, format="wav")
start_trim = milliseconds_until_sound(audio)
trimmed = audio[start_trim:]
new_filename = directory / f"trimmed_{filename}"
trimmed.export(new_filename, format="wav")
return trimmed, new_filename
def transcribe_audio(file,output_dir):
audio_path = os.path.join(output_dir, file)
with open(audio_path, 'rb') as audio_data:
transcription = client.audio.transcriptions.create(
model="whisper-1", file=audio_data)
return transcription.text
At times, we've seen unicode character injection in transcripts, removing any non-ASCII
characters should help mitigate this issue.
Keep in mind you should not use this function if you are transcribing in Greek, Cyrillic, Arabic,
Chinese, etc
This function will add formatting and punctuation to our transcript. Whisper generates a
transcript with punctuation but without formatting.
This function will create a new file with 'trimmed' appended to the original file name
Our fake earnings report audio file is fairly short in length, so we'll adjust the segments
accordingly. Keep in mind you can adjust the segment length as you need.
# Segment audio
trimmed_audio = AudioSegment.from_wav(trimmed_filename) # Load the trimmed audio file
while start_time < len(trimmed_audio): # Loop over the trimmed audio file
segment = trimmed_audio[start_time:start_time + one_minute] # Extract a segment
segment.export(os.path.join(output_dir_trimmed, f"trimmed_{i:02d}.wav"), format="wav") # Save th
start_time += one_minute # Update the start time for the next segment
i += 1 # Increment the index for naming the next file
# Get list of trimmed and segmented audio files and sort them numerically
audio_files = sorted(
(f for f in os.listdir(output_dir_trimmed) if f.endswith(".wav")),
key=lambda f: int(''.join(filter(str.isdigit, f)))
)
print(full_transcript)
Good afternoon, everyone. And welcome to FinTech Plus Sync's second quarter 2023 earnings call.
Good afternoon, everyone. And welcome to FinTech Plus Sync's second quarter 2023 earnings call.
print(punctuated_transcript)
Good afternoon, everyone. And welcome to FinTech Plus Sync's second quarter 2023 earnings call.
print(final_transcript)
Good afternoon, everyone. And welcome to FinTech Plus Sync's second quarter 2023 earnings call.
Words Changed:
1. Q2 -> second quarter (Q2)
2. EBITDA -> Earnings Before Interest, Taxes, Depreciation, and Amortization (EBITDA)
3. Q2 2022 -> second quarter (Q2) 2022
4. CDOs -> Collateralized Debt Obligations (CDOs)
5. RMBS -> Residential Mortgage-Backed Securities (RMBS)
6. D/E -> Debt-to-Equity (D/E)
7. CAC -> Customer Acquisition Cost (CAC)
8. LTV -> Lifetime Value (LTV)
9. LTVCAC -> LTV to CAC (LTVCAC)
10. VaR -> Value at Risk (VaR)
11. IPO -> Initial Public Offering (IPO)
12. Q3 -> third quarter (Q3)
Cookbook About API Docs Contribute
Neon is Serverless Postgres built for the cloud. Neon separates compute and storage to offer
modern developer features such as autoscaling, database branching, scale-to-zero, and more.
Vector search
Neon supports vector search using the pgvector open-source PostgreSQL extension, which
enables Postgres as a vector database for storing and querying embeddings.
Check out the notebook in this repo for working with Neon Serverless Postgres as your vector
database.
4. Use Neon with the pgvector extension to perform vector similarity search
Scaling Support
Neon enables you to scale your AI applications with the following features:
Autoscaling: If your AI application experiences heavy load during certain hours of the day
or at different times, Neon can automatically scale compute resources without manual
intervention. During periods of inactivity, Neon is able to scale to zero.
Instant read replicas: Neon supports instant read replicas, which are independent read-
only compute instances designed to perform read operations on the same data as your
read-write computes. With read replicas, you can offload reads from your read-write
compute instance to a dedicated read-only compute instance for your AI application.
The Neon serverless driver: Neon supports a low-latency serverless PostgreSQL driver for
JavaScript and TypeScript applications that allows you to query data from serverless and
edge environments, making it possible to achieve sub-10ms queries.
More Examples
Build an AI-powered semantic search application - Submit a startup idea and get a list of
similar ideas that YCombinator has invested in before
Build an AI-powered chatbot - A Postgres Q&A chatbot that uses Postgres as a vector
database
Vercel Postgres pgvector Starter - Vector similarity search with Vercel Postgres (powered
by Neon)
Additional Resources
Segment Anything is a model from Meta that can be used to select portions of images.
Combined with DALL·E's ability to inpaint specified portions of images, you can use Segment
Anything to easily select any part of an image you'd like to alter.
In this notebook, we'll use these tools to become fashion designers and dynamically replace our
digital models' outfits with tailored, original creations. The notebook follows this flow:
Generate original image: Make an original image that we'll create dynamic masks from.
Create new image: Generate a new image with the masked area inpainted with a fresh
prompt.
Setup
To get started we'll need to follow the instructions for using the Segment Anything (SAM)
model open-sourced by Meta. As of May 2023, the key steps are:
!wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
import cv2
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from matplotlib import rcParams
import numpy as np
from openai import OpenAI
import os
from PIL import Image
import requests
from segment_anything import sam_model_registry, SamAutomaticMaskGenerator, SamPredictor
import torch
First we'll create an original image which we'll generate masks from.
return filepaths
dalle_prompt = '''
Full length, zoomed out photo of our premium Lederhosen-inspired jumpsuit.
Showcase the intricate hand-stitched details and high-quality leather, while highlighting the perfect
This piece appeals to a sophisticated, trendsetting audience who appreciates cultural fusion and inno
'''
Generate Mask
For this demonstration we're picking a UX where we "click" on a point on the image to generate
masks from. However, there are example notebooks provided by Meta which show how to
generate every possible mask for an image, draw a box, and some other useful approaches.
# Check the shape - should be three masks of the same dimensions as our image
masks.shape
# Display the possible masks we can select along with their confidence
for i, (mask, score) in enumerate(zip(masks, scores)):
plt.figure(figsize=(10, 10))
plt.imshow(image)
show_mask(mask, plt.gca())
show_points(input_point, input_label, plt.gca())
plt.title(f"Mask {i+1}, Score: {score:.3f}", fontsize=18)
plt.axis("off")
plt.show()
# We'll now reverse the mask so that it is clear and everything else is white
chosen_mask = chosen_mask.astype("uint8")
chosen_mask[chosen_mask != 0] = 255
chosen_mask[chosen_mask == 0] = 1
chosen_mask[chosen_mask == 255] = 0
chosen_mask[chosen_mask == 1] = 255
# Convert mask back to pixels to add our mask replacing the third dimension
pix = np.array(mask)
pix[:, :, 3] = chosen_mask
Now we'll combine our original image with the mask and the Edit endpoint for DALLE to inpaint
the transparent area according to a new prompt. (as 0f January 2024 dall-e-2 is the only model
that supports edits)
# edit an image
edit_response = client.images.edit(
image=open(chosen_image, "rb"), # from the generation section
mask=open(os.path.join(mask_dir, "new_mask.png"), "rb"), # from right above
prompt="Brilliant leather Lederhosen with a formal look, detailed, intricate, photorealistic", #
n=3,
size="1024x1024",
response_format="url",
)
edit_filepaths = process_dalle_images(edit_response, "edits", edit_image_dir)
# read images
img_A = mpimg.imread(edit_filepaths[0])
img_B = mpimg.imread(edit_filepaths[1])
img_C = mpimg.imread(edit_filepaths[2])
# display images
fig, ax = plt.subplots(1,3)
[a.axis("off") for a in ax]
ax[0].imshow(img_A)
ax[1].imshow(img_B)
ax[2].imshow(img_C)
<matplotlib.image.AxesImage at 0x791b1f4c58a0>
Beautiful!
Now you too can easily create dynamic masks to extend your images - enjoy the APIs, and
please share what you build!
Cookbook About API Docs Contribute
In this notebook we delve into the evaluation techniques for abstractive summarization tasks
using a simple example. We explore traditional evaluation methods like ROUGE and BERTScore,
in addition to showcasing a more novel approach using LLMs as evaluators.
In addition to these traditional metrics, we showcase a method (G-Eval) that leverages Large
Language Models (LLMs) as a novel, reference-free metric for assessing abstractive summaries.
In this case, we use gpt-4 to score candidate outputs. gpt-4 has effectively learned an internal
model of language quality that allows it to differentiate between fluent, coherent text and low-
quality text. Harnessing this internal scoring mechanism allows auto-evaluation of new
candidate outputs generated by an LLM.
Setup
# BERTScore leverages the pre-trained contextual embeddings from BERT and matches words in candidate
from bert_score import BERTScorer
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
<IPython.core.display.Javascript object>
Example task
For the purposes of this notebook we'll use the example summarization below. Notice that we
provide two generated summaries to compare, and a reference human-written summary, which
evaluation metrics like ROUGE and BERTScore require.
Excerpt ( excerpt ):
“OpenAI's mission is to ensure that artificial general intelligence (AGI) benefits all of
humanity. OpenAI will build safe and beneficial AGI directly, but will also consider its
mission fulfilled if its work aids others to achieve this outcome. OpenAI follows several key
principles for this purpose. First, broadly distributed benefits - any influence over AGI's
deployment will be used for the benefit of all, and to avoid harmful uses or undue
concentration of power. Second, long-term safety - OpenAI is committed to doing the
research to make AGI safe, and to promote the adoption of such research across the AI
community. Third, technical leadership - OpenAI aims to be at the forefront of AI
capabilities. Fourth, a cooperative orientation - OpenAI actively cooperates with other
research and policy institutions, and seeks to create a global community working together
to address AGI's global challenges.”
Summaries:
Eval Summary 2 /
Reference Summary Eval Summary 1 / eval_summary_1 eval_summary_2 (system
/ ref_summary (human generated) (system generated) generated)
OpenAI aims to ensure artificial OpenAI aims to AGI benefits all OpenAI aims to ensure AGI is
general intelligence (AGI) is used humanity, avoiding harmful uses and for everyone's use, totally
for everyone's benefit, avoiding power concentration. It pioneers avoiding harmful stuff or big
harmful uses or undue power research into safe and beneficial AGI power concentration.
concentration. It is committed to and promotes adoption globally. Committed to researching AGI's
researching AGI safety, promoting OpenAI maintains technical safe side, promoting these
such studies among the AI leadership in AI while cooperating studies in AI folks. OpenAI
community. OpenAI seeks to lead with global institutions to address wants to be top in AI things and
in AI capabilities and cooperates AGI challenges. It seeks to lead a works with worldwide research,
with global research and policy collaborative worldwide effort policy groups to figure AGI's
institutions to address AGI's developing AGI for collective good. stuff.
challenges.
Take a moment to figure out which summary you'd personally prefer and the one that captures
OpenAI's mission really well.
excerpt = "OpenAI's mission is to ensure that artificial general intelligence (AGI) benefits all of h
ref_summary = "OpenAI aims to ensure artificial general intelligence (AGI) is used for everyone's ben
eval_summary_1 = "OpenAI aims to AGI benefits all humanity, avoiding harmful uses and power concentra
eval_summary_2 = "OpenAI aims to ensure AGI is for everyone's use, totally avoiding harmful stuff or
<IPython.core.display.Javascript object>
ROUGE, which stands for Recall-Oriented Understudy for Gisting Evaluation, primarily gauges
the overlap of words between a generated output and a reference text. It's a prevalent metric
for evaluating automatic summarization tasks. Among its variants, ROUGE-L offers insights into
the longest contiguous match between system-generated and reference summaries, gauging
how well the system retains the original summary's essence.
# function to calculate the Rouge score
def get_rouge_scores(text1, text2):
rouge = Rouge()
return rouge.get_scores(text1, text2)
rouge_scores_out = []
row = {
"Metric": f"{metric} ({label})",
"Summary 1": eval_1_score,
"Summary 2": eval_2_score,
}
rouge_scores_out.append(row)
def highlight_max(s):
is_max = s == s.max()
return [
"background-color: lightgreen" if v else "background-color: white"
for v in is_max
]
rouge_scores_out = (
pd.DataFrame(rouge_scores_out)
.set_index("Metric")
.style.apply(highlight_max, axis=1)
)
rouge_scores_out
Summary 1 Summary 2
Metric
<IPython.core.display.Javascript object>
The table shows the ROUGE scores for evaluating two different summaries against a reference
text. In the case of rouge-1 , Summary 2 outperforms Summary 1, indicating a better overlap of
individual words and for rouge-l , Summary 2 has a higher score, implying a closer match in
the longest common subsequences, and thus a potentially better overall summarization in
capturing the main content and order of the original text. Since Summary 2 has many words
and short phrases directly lifted from the excerpt, its overlap with the reference summary would
likely be higher, leading to higher ROUGE scores.
While ROUGE and similar metrics, such as BLEU and METEOR, offer quantitative measures, they
often fail to capture the true essence of a well-generated summary. They also correlate worse
with human scores. Given the advancements in LLMs, which are adept at producing fluent and
coherent summaries, traditional metrics like ROUGE may inadvertently penalize these models.
This is especially true if the summaries are articulated differently but still encapsulate the core
information accurately.
ROUGE relies on the exact presence of words in both the predicted and reference texts, failing
to interpret the underlying semantics. This is where BERTScore comes in and leverages the
contextual embeddings from the BERT model, aiming to evaluate the similarity between a
predicted and a reference sentence in the context of machine-generated text. By comparing
embeddings from both sentences, BERTScore captures semantic similarities that might be
missed by traditional n-gram based metrics.
Some weights of the model checkpoint at roberta-large were not used when initializing RobertaMo
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that
<IPython.core.display.Javascript object>
The close F1 Scores between the summaries indicate that they may perform similarly in
capturing the key information. However, this small difference should be interpreted with
caution. Since BERTScore may not fully grasp subtleties and high-level concepts that a human
evaluator might understand, reliance solely on this metric could lead to misinterpreting the
actual quality and nuances of the summary. An integrated approach combining BERTScore with
human judgment and other metrics could offer a more reliable evaluation.
Here we implement an example reference-free text evaluator using gpt-4 , inspired by the G-
Eval framework which evaluates the quality of generated text using large language models.
Unlike metrics like ROUGE or BERTScore that rely on comparison to reference summaries, the
gpt-4 based evaluator assesses the quality of generated content based solely on the input
prompt and text, without any ground truth references. This makes it applicable to new datasets
and tasks where human references are sparse or unavailable.
1. Relevance: Evaluates if the summary includes only important information and excludes
redundancies.
3. Consistency: Checks if the summary aligns with the facts in the source document.
3. We generate scores from gpt-4 with the defined prompts, comparing them across
summaries.
In this demonstration, we're using a direct scoring function where gpt-4 generates a discrete
score (1-5) for each metric. Normalizing the scores and taking a weighted sum could result in
more robust, continuous scores that better reflect the quality and diversity of the summaries.
Evaluation Criteria:
{criteria}
Evaluation Steps:
{steps}
Example:
Source Text:
{document}
Summary:
{summary}
- {metric_name}
"""
# Metric 1: Relevance
RELEVANCY_SCORE_CRITERIA = """
Relevance(1-5) - selection of important content from the source. \
The summary should include only important information from the source document. \
Annotators were instructed to penalize summaries which contained redundancies and excess information
"""
RELEVANCY_SCORE_STEPS = """
1. Read the summary and the source document carefully.
2. Compare the summary to the source document and identify the main points of the article.
3. Assess how well the summary covers the main points of the article, and how much irrelevant or redu
4. Assign a relevance score from 1 to 5.
"""
# Metric 2: Coherence
COHERENCE_SCORE_CRITERIA = """
Coherence(1-5) - the collective quality of all sentences. \
We align this dimension with the DUC quality question of structure and coherence \
whereby "the summary should be well-structured and well-organized. \
The summary should not just be a heap of related information, but should build from sentence to a\
coherent body of information about a topic."
"""
COHERENCE_SCORE_STEPS = """
1. Read the article carefully and identify the main topic and key points.
2. Read the summary and compare it to the article. Check if the summary covers the main topic and key
and if it presents them in a clear and logical order.
3. Assign a score for coherence on a scale of 1 to 5, where 1 is the lowest and 5 is the highest base
"""
# Metric 3: Consistency
CONSISTENCY_SCORE_CRITERIA = """
Consistency(1-5) - the factual alignment between the summary and the summarized source. \
A factually consistent summary contains only statements that are entailed by the source document. \
Annotators were also asked to penalize summaries that contained hallucinated facts.
"""
CONSISTENCY_SCORE_STEPS = """
1. Read the article carefully and identify the main facts and details it presents.
2. Read the summary and compare it to the article. Check if the summary contains any factual errors t
3. Assign a score for consistency based on the Evaluation Criteria.
"""
# Metric 4: Fluency
FLUENCY_SCORE_CRITERIA = """
Fluency(1-3): the quality of the summary in terms of grammar, spelling, punctuation, word choice, and
1: Poor. The summary has many errors that make it hard to understand or sound unnatural.
2: Fair. The summary has some errors that affect the clarity or smoothness of the text, but the main
3: Good. The summary has few or no errors and is easy to read and follow.
"""
FLUENCY_SCORE_STEPS = """
Read the summary and evaluate its fluency based on the given criteria. Assign a fluency score from 1
"""
def get_geval_score(
criteria: str, steps: str, document: str, summary: str, metric_name: str
):
prompt = EVALUATION_PROMPT_TEMPLATE.format(
criteria=criteria,
steps=steps,
metric_name=metric_name,
document=document,
summary=summary,
)
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": prompt}],
temperature=0,
max_tokens=5,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
)
return response.choices[0].message.content
evaluation_metrics = {
"Relevance": (RELEVANCY_SCORE_CRITERIA, RELEVANCY_SCORE_STEPS),
"Coherence": (COHERENCE_SCORE_CRITERIA, COHERENCE_SCORE_STEPS),
"Consistency": (CONSISTENCY_SCORE_CRITERIA, CONSISTENCY_SCORE_STEPS),
"Fluency": (FLUENCY_SCORE_CRITERIA, FLUENCY_SCORE_STEPS),
}
Evaluation Type
Coherence 5 3
Consistency 5 5
Fluency 3 2
Relevance 5 4
<IPython.core.display.Javascript object>
Overall, the Summary 1 appears to outperform Summary 2 in three of the four categories
(Coherence, Relevance and Fluency). Both summaries are found to be consistent with each
other. The result might suggest that Summary 1 is generally preferable based on the given
evaluation criteria.
Limitations
Note that LLM-based metrics could have a bias towards preferring LLM-generated texts over
human-written texts. Additionally LLM based metrics are sensitive to system messages/prompts.
We recommend experimenting with other techniques that can help improve performance
and/or get consistent scores, striking the right balance between high-quality expensive
evaluation and automated evaluations. It is also worth noting that this scoring methodology is
currently limited by gpt-4 's context window.
Conclusion
Evaluating abstractive summarization remains an open area for further improvement. Traditional
metrics like ROUGE , BLEU , and BERTScore provide useful automatic evaluation but have
limitations in capturing semantic similarity and nuanced aspects of summarization quality.
Moreover, they require reference outputs which can be expensive to collect/label. LLM-based
metrics offer promise as a reference-free method of evaluating coherence, fluency, and
relevance. However, they too have potential biases favoring text generated by LLMs. Ultimately,
a combination of automatic metrics and human evaluation is ideal for reliably assessing
abstractive summarization systems. While human evaluation is indispensable for gaining a
comprehensive understanding of summary quality, it should be complemented with automated
evaluation to enable efficient, large-scale testing. The field will continue to evolve more robust
evaluation techniques, balancing quality, scalability, and fairness. Advancing evaluation methods
is crucial for driving progress in production applications.
References
G-EVAL: NLG Evaluation Using GPT-4 with Better Human Alignment - Liu Y, Iter D, Xu Y,
Wang S, Xu R, Zhu C. Published May, 2023.
Elasticsearch
Liam Thompson
Open in Github
Aug 28, 2023
The elasticsearch-labs repo contains executable Python notebooks, sample apps, and
resources for testing out the Elastic platform.
Semantic search
Vector search is an innovative technology that enables developers and engineers to efficiently
store, search, and recommend information by representing complex data as mathematical
vectors. By comparing the similarities between these vectors, you can quickly retrieve relevant
information in a seamless and intuitive manner.
Pinecone is a vector database designed with developers and engineers in mind. As a managed
service, it alleviates the burden of maintenance and engineering, allowing you to focus on
extracting valuable insights from your data. The free tier supports up to 5 million vectors,
making it an accessible and cost-effective way to experiment with vector search capabilities.
With Pinecone, you'll experience impressive speed, accuracy, and scalability, as well as access to
advanced features like single-stage metadata filtering and the cutting-edge sparse-dense index.
Examples
This folder contains examples of using Pinecone and OpenAI together. More will be added over
time so check back for updates!
In this notebook we will be going over generating embeddings of movie descriptions with
OpenAI and using those embeddings within Milvus to find relevant movies. To narrow our
search results and try something new, we are going to be using filtering to do metadata
searches. The dataset in this example is sourced from HuggingFace datasets, and contains a
little over 8 thousand movie entries.
Lets begin by first downloading the required libraries for this notebook:
With the required packages installed we can get started. Lets begin by launching the Milvus
service. The file being run is the docker-compose.yaml found in the folder of this file. This
command launches a Milvus standalone instance which we will use for this test.
! docker compose up -d
import openai
HOST = 'localhost'
PORT = 19530
COLLECTION_NAME = 'movie_search'
DIMENSION = 1536
OPENAI_ENGINE = 'text-embedding-3-small'
openai.api_key = 'sk-your_key'
INDEX_PARAM = {
'metric_type':'L2',
'index_type':"HNSW",
'params':{'M': 8, 'efConstruction': 64}
}
QUERY_PARAM = {
"metric_type": "L2",
"params": {"ef": 64},
}
BATCH_SIZE = 1000
Dataset
With Milvus up and running we can begin grabbing our data. Hugging Face Datasets is a hub
that holds many different user datasets, and for this example we are using HuggingLearners's
netflix-shows dataset. This dataset contains movies and their metadata pairs for over 8
thousand movies. We are going to embed each description and store it within Milvus along with
its title, type, release_year and rating.
import datasets
Now that we have our data on our machine we can begin embedding it and inserting it into
Milvus. The embedding function takes in text and returns the embeddings in a list format.
This next step does the actual inserting. We iterate through all the entries and create batches
that we insert once we hit our set batch size. After the loop is over we insert the last remaning
batch if it exists.
data = [
[], # title
[], # type
[], # release_year
[], # rating
[], # description
]
With our data safely inserted in Milvus, we can now perform a query. The query takes in a tuple
of the movie description you are searching for an the filter to use. More info about the filter can
be found here. The search first prints out your description and filter expression. After that for
each result we print the score, title, type, release year, rating, and description of the result
movies.
import textwrap
my_query = ('movie about a fluffly animal', 'release_year < 2019 and rating like \"PG%\"')
query(my_query)
Description: movie about a fluffly animal Expression: release_year < 2019 and rating like "PG%"
Results:
Rank: 1 Score: 0.30083978176116943 Title: The Lamb
Type: Movie Release Year: 2017 Rating: PG
A big-dreaming donkey escapes his menial existence and befriends some free-spirited
animal pals in this imaginative retelling of the Nativity Story.
Rank: 2 Score: 0.33528298139572144 Title: Puss in Boots
Type: Movie Release Year: 2011 Rating: PG
The fabled feline heads to the Land of Giants with friends Humpty Dumpty and Kitty
Softpaws on a quest to nab its greatest treasure: the Golden Goose.
People are writing great tools and papers for improving outputs from GPT. Here are some cool
ones we've seen:
Arthur Shield: A paid product for detecting toxicity, hallucination, prompt injection, etc.
Embedchain: A Python library for managing and syncing unstructured data with LLMs.
FLAML (A Fast Library for Automated Machine Learning & Tuning): A Python library for
automating selection of models, hyperparameters, and other tunable choices.
Guardrails.ai: A Python library for validating outputs and retrying failures. Still in alpha, so
expect sharp edges and bugs.
Guidance: A handy looking Python library from Microsoft that uses Handlebars templating
to interleave generation, prompting, and logical control.
LiteLLM: A minimal Python library for calling LLM APIs with a consistent format.
OpenAI Evals: An open-source library for evaluating task performance of language models
and prompts.
Parea AI: A platform for debugging, testing, and monitoring LLM apps.
Portkey: A platform for observability, model management, evals, and security for LLM apps.
Promptify: A small Python library for using language models to perform NLP tasks.
Prompttools: Open-source Python tools for testing and evaluating models, vector DBs, and
prompts.
Scale Spellbook: A paid product for building, comparing, and shipping language model
apps.
Weights & Biases: A paid product for tracking model training and prompt engineering
experiments.
YiVal: An open-source GenAI-Ops tool for tuning and evaluating prompts, retrieval
configurations, and model parameters using customizable datasets, evaluation methods,
and evolution strategies.
Prompting guides
Brex's Prompt Engineering Guide: Brex's introduction to language models and prompt
engineering.
Xavi Amatriain's Prompt Engineering 101 Introduction to Prompt Engineering and 202
Advanced Prompt Engineering: A basic but opinionated introduction to prompt
engineering and a follow up collection with many advanced methods starting with CoT.
Video courses
Andrej Karpathy's Let's build GPT: A detailed dive into the machine learning underlying
GPT.
Scrimba course about Assistants API: A 30-minute interactive course about the Assistants
API.
LinkedIn course: Introduction to Prompt Engineering: How to talk to the AIs: Short video
introduction to prompt engineering
Tree of Thoughts: Deliberate Problem Solving with Large Language Models (2023):
Searching over trees of step by step reasoning helps even more than voting over chains of
thought. It lifts GPT-4 's scores on creative writing and crosswords.
Language Models are Zero-Shot Reasoners (2022): Telling instruction-following models to
think step by step improves their reasoning. It lifts text-davinci-002 's score on math
word problems (GSM8K) from 13% to 41%.
Faithful Reasoning Using Large Language Models (2022): Reasoning can be improved by
a system that combines: chains of thought generated by alternative selection and inference
prompts, a halter model that chooses when to halt selection-inference loops, a value
function to search over multiple reasoning paths, and sentence labels that help avoid
hallucination.
STaR: Bootstrapping Reasoning With Reasoning (2022): Chain of thought reasoning can
be baked into models via fine-tuning. For tasks with an answer key, example chains of
thoughts can be generated by language models.
ReAct: Synergizing Reasoning and Acting in Language Models (2023): For tasks with tools
or an environment, chain of thought works better if you prescriptively alternate between
Reasoning steps (thinking about what to do) and Acting (getting information from a tool or
environment).
In this example notebook, we showcase how to perform financial analysis over 10-K documents
with the LlamaIndex framework with just a few lines of code.
Notebook Outline
Introduction
Setup
Simple QA
Introduction
LLamaIndex
LlamaIndex is a data framework for LLM applications. You can get started with just a few lines of
code and build a retrieval-augmented generation (RAG) system in minutes. For more advanced
users, LlamaIndex offers a rich toolkit for ingesting and indexing your data, modules for retrieval
and re-ranking, and composable components for building custom query engines.
We showcase how LlamaIndex can support a financial analyst in quickly extracting information
and synthesize insights across multiple documents with very little coding.
Setup
Before we start, we can configure the LLM provider and model that will power our RAG system.
Here, we pick gpt-3.5-turbo-instruct from OpenAI.
We construct a ServiceContext and set it as the global default, so all subsequent operations
that depends on LLM calls will use the model we configured here.
service_context = ServiceContext.from_defaults(llm=llm)
set_global_service_context(service_context=service_context)
Data Loading and Indexing
Now, we load and parse 2 PDFs (one for Uber 10-K in 2021 and another for Lyft 10-k in 2021).
Under the hood, the PDFs are converted to plain text Document objects, separate by page.
“Note: this operation might take a while to run, since each document is more than 100
pages.”
lyft_docs = SimpleDirectoryReader(input_files=["../data/10k/lyft_2021.pdf"]).load_data()
uber_docs = SimpleDirectoryReader(input_files=["../data/10k/uber_2021.pdf"]).load_data()
Now, we can build an (in-memory) VectorStoreIndex over the documents that we've loaded.
“Note: this operation might take a while to run, since it calls OpenAI API for computing
vector embedding over document chunks.”
lyft_index = VectorStoreIndex.from_documents(lyft_docs)
uber_index = VectorStoreIndex.from_documents(uber_docs)
Simple QA
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)
response = await lyft_engine.aquery('What is the revenue of Lyft in 2021? Answer in millions with pag
print(response)
response = await uber_engine.aquery('What is the revenue of Uber in 2021? Answer in millions, with pa
print(response)
For more complex financial analysis, one often needs to reference multiple documents.
As a example, let's take a look at how to do compare-and-contrast queries over both Lyft and
Uber financials.
For this, we build a SubQuestionQueryEngine , which breaks down a complex compare-and-
contrast query, into simpler sub-questions to execute on respective sub query engine backed by
individual indices.
query_engine_tools = [
QueryEngineTool(
query_engine=lyft_engine,
metadata=ToolMetadata(name='lyft_10k', description='Provides information about Lyft financial
),
QueryEngineTool(
query_engine=uber_engine,
metadata=ToolMetadata(name='uber_10k', description='Provides information about Uber financial
),
]
s_engine = SubQuestionQueryEngine.from_defaults(query_engine_tools=query_engine_tools)
response = await s_engine.aquery('Compare and contrast the customer segments and geographies that gre
The customer segments that grew the fastest for Uber in 2021 were its Mobility Drivers, Courier
[0m[33;1m[1;3m[uber_10k] Q: What geographies grew the fastest for Uber
[0m[33;1m[1;3m[uber_10k] A:
Based on the context information, it appears that Uber experienced the most growth in large met
[0m[38;5;200m[1;3m[lyft_10k] Q: What customer segments grew the fastest for Lyft
[0m[38;5;200m[1;3m[lyft_10k] A:
The customer segments that grew the fastest for Lyft were ridesharing, light vehicles, and publ
[0m[32;1m[1;3m[lyft_10k] Q: What geographies grew the fastest for Lyft
[0m[32;1m[1;3m[lyft_10k] A:
It is not possible to answer this question with the given context information.
[0m
print(response)
The customer segments that grew the fastest for Uber in 2021 were its Mobility Drivers, Courier
The customer segments that grew the fastest for Lyft were ridesharing, light vehicles, and publ
In summary, Uber and Lyft both experienced growth in customer segments related to mobility, cou
response = await s_engine.aquery('Compare revenue growth of Uber and Lyft from 2020 to 2021')
print(response)
The revenue growth of Uber from 2020 to 2021 was 57%, or 54% on a constant currency basis, whil
Cookbook About API Docs Contribute
The aim of this notebook is to walk through a comprehensive example of how to fine-tune
OpenAI models for Retrieval Augmented Generation (RAG).
We will also be integrating Qdrant and Few-Shot Learning to boost the model's performance
and reduce hallucinations. This could serve as a practical guide for ML practitioners, data
scientists, and AI Engineers interested in leveraging the power of OpenAI models for specific
use-cases. 🤩
Why should you read this blog?
Use fine-tuning to improve the correctness of your RAG model and reduce hallucinations
To begin, we've selected a dataset where we've a guarantee that the retrieval is perfect. We've
selected a subset of the SQuAD dataset, which is a collection of questions and answers about
Wikipedia articles. We've also included samples where the answer is not present in the context,
to demonstrate how RAG handles this case.
Table of Contents
1. Setting up the Environment
8. Evaluation
9. Conclusion
Aggregate Results
Observations
Retrieval Augmented Generation (RAG)? The phrase Retrieval Augmented Generation (RAG)
comes from a recent paper by Lewis et al. from Facebook AI. The idea is to use a pre-trained
language model (LM) to generate text, but to use a separate retrieval system to find relevant
documents to condition the LM on.
What is Qdrant? Qdrant is an open-source vector search engine that allows you to search for
similar vectors in a large dataset. It is built in Rust and here we'll use the Python client to
interact with it. This is the Retrieval part of RAG.
What is Few-Shot Learning? Few-shot learning is a type of machine learning where the model
is "improved" via training or fine-tuning on a small amount of data. In this case, we'll use it to
fine-tune the RAG model on a small number of examples from the SQuAD dataset. This is the
Augmented part of RAG.
What is Zero-Shot Learning? Zero-shot learning is a type of machine learning where the model
is "improved" via training or fine-tuning without any dataset specific information.
What is Fine-Tuning? Fine-tuning is a type of machine learning where the model is "improved"
via training or fine-tuning on a small amount of data. In this case, we'll use it to fine-tune the
RAG model on a small number of examples from the SQuAD dataset. The LLM is what makes
the Generation part of RAG.
!pip install pandas openai tqdm tenacity scikit-learn tiktoken python-dotenv seaborn --upgrade --quie
import json
import os
import time
import pandas as pd
from openai import OpenAI
import tiktoken
import seaborn as sns
from tenacity import retry, wait_exponential
from tqdm import tqdm
from collections import defaultdict
import numpy as np
import matplotlib.pyplot as plt
import numpy as np
from sklearn.metrics import confusion_matrix
import warnings
warnings.filterwarnings('ignore')
tqdm.pandas()
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
os.environ["QDRANT_URL"] = "https://xxx.cloud.qdrant.io:6333"
os.environ["QDRANT_API_KEY"] = "xxx"
Section A
For the purpose of demonstration, we'll make small slices from the train and validation splits of
the SQuADv2 dataset. This dataset has questions and contexts where the answer is not present
in the context, to help us evaluate how LLM handles this case.
We'll read the data from the JSON files and create a dataframe with the following columns:
question , context , answer , is_impossible .
# !mkdir -p local_cache
# !wget https://rajpurkar.github.io/SQuAD-explorer/dataset/train-v2.0.json -O local_cache/train.json
# !wget https://rajpurkar.github.io/SQuAD-explorer/dataset/dev-v2.0.json -O local_cache/dev.json
def json_to_dataframe_with_titles(json_data):
qas = []
context = []
is_impossible = []
answers = []
titles = []
ans_list = []
for ans in qa['answers']:
ans_list.append(ans['text'])
answers.append(ans_list)
titles.append(title)
train_df = json_to_dataframe_with_titles(json.load(open('local_cache/train.json')))
val_df = json_to_dataframe_with_titles(json.load(open('local_cache/dev.json')))
Let's start by using the base gpt-3.5-turbo-0613 model to answer the questions. This prompt is
a simple concatenation of the question and context, with a separator token in between: \n\n .
We've a simple instruction part of the prompt:
“Answer the following Question based on the Context only. Only answer from the Context. If
you don't know the answer, say 'I don't know'.”
Other prompts are possible, but this is a good starting point. We'll use this prompt to answer
the questions in the validation set.
Next, you'll need some re-usable functions which make an OpenAI API Call and return the
answer. You'll use the ChatCompletion.create endpoint of the API, which takes a prompt and
returns the completed text.
df
French and Indian War When was the Britain gained True []
For the complete fine-tuning process, please refer to the OpenAI Fine-Tuning Docs.
We need to prepare the data for fine-tuning. We'll use a few samples from train split of same
dataset as before, but we'll add the answer to the context. This will help the model learn to
retrieve the answer from the context.
Our instruction prompt is the same as before, and so is the system prompt.
def dataframe_to_jsonl(df):
def create_jsonl_entry(row):
answer = row["answers"][0] if row["answers"] else "I don't know"
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{
"role": "user",
"content": f"""Answer the following Question based on the Context only. Only answer f
Question: {row.question}\n\n
Context: {row.context}\n\n
Answer:\n""",
},
{"role": "assistant", "content": answer},
]
return json.dumps({"messages": messages})
You can see this cookbook for more details on how to prepare the data for fine-tuning.
If you're new to OpenAI Model Fine-Tuning, please refer to the How to finetune Chat models
notebook. You can also refer to the OpenAI Fine-Tuning Docs for more details.
class OpenAIFineTuner:
"""
Class to fine tune OpenAI models
"""
def __init__(self, training_file_path, model_name, suffix):
self.training_file_path = training_file_path
self.model_name = model_name
self.suffix = suffix
self.file_object = None
self.fine_tuning_job = None
self.model_id = None
def create_openai_file(self):
self.file_object = client.files.create(
file=open(self.training_file_path, "r"),
purpose="fine-tune",
)
def create_fine_tuning_job(self):
self.fine_tuning_job = client.fine_tuning.jobs.create(
training_file=self.file_object["id"],
model=self.model_name,
suffix=self.suffix,
)
def retrieve_fine_tuned_model(self):
self.model_id = client.fine_tuning.jobs.retrieve(self.fine_tuning_job["id"]).fine_tuned_model
return self.model_id
def fine_tune_model(self):
self.create_openai_file()
self.wait_for_file_processing()
self.create_fine_tuning_job()
self.wait_for_fine_tuning()
return self.retrieve_fine_tuned_model()
fine_tuner = OpenAIFineTuner(
training_file_path="local_cache/100_train.jsonl",
model_name="gpt-3.5-turbo",
suffix="100trn20230907"
)
model_id = fine_tuner.fine_tune_model()
model_id
Let's try out the fine-tuned model on the same validation set as before. You'll use the same
prompt as before, but you will use the fine-tuned model instead of the base model. Before you
do that, you can make a simple call to get a sense of how the fine-tuned model is doing.
completion = client.chat.completions.create(
model=model_id,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"},
{"role": "assistant", "content": "Hi, how can I help you today?"},
{
"role": "user",
"content": "Can you answer the following question based on the given context? If not, say
},
],
)
print(completion.choices[0].message)
This is the same as before, but you'll use the fine-tuned model instead of the base model.
To evaluate the model's performance, compare the predicted answer to the actual answers -- if
any of the actual answers are present in the predicted answer, then it's a match. We've also
created error categories to help you understand where the model is struggling.
When we know that a correct answer exists in the context, we can measure the model's
performance, there are 3 possible outcomes:
1. ✅ Answered Correctly: The model responded the correct answer. It may have also
included other answers that were not in the context.
2. ❎ Skipped: The model responded with "I don't know" (IDK) while the answer was present
in the context. It's better than giving the wrong answer. It's better for the model say "I don't
know" than giving the wrong answer. In our design, we know that a true answer exists and
hence we're able to measure it -- this is not always the case. This is a model error. We
exclude this from the overall error rate.
3. ❌ Wrong: The model responded with an incorrect answer. This is a model ERROR.
When we know that a correct answer does not exist in the context, we can measure the model's
performance, there are 2 possible outcomes:
4. ❌ Hallucination: The model responded with an answer, when "I don't know" was
expected. This is a model ERROR.
5. ✅ I don't know: The model responded with "I don't know" (IDK) and the answer was not
present in the context. This is a model WIN.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
class Evaluator:
def __init__(self, df):
self.df = df
self.labels_answer_expected = [" ✅
self.y_pred = pd.Series() # Initialize as empty Series
Answered Correctly", " ❎ Skipped", " ❌ Wrong Answer"]
self.labels_idk_expected = [" ❌ Hallucination", " ✅ I don't know"]
"✅ ❎
return (
Answered Correctly" if any(ans in generated_answer for ans in actual_answers)
)
❌
else "
else "
Skipped" if generated_answer == "i don't know"
Wrong Answer"
❌ ✅
return (
" Hallucination" if generated_answer != "i don't know"
else " I don't know"
)
def print_eval(self):
answer_columns=["generated_answer", "ft_generated_answer"]
baseline_correctness, baseline_idk = self.evaluate_model()
ft_correctness, ft_idk = self.evaluate_model(self.df, answer_columns[1])
print("When the model should answer correctly:")
eval_df = pd.merge(
baseline_correctness.rename("Baseline"),
ft_correctness.rename("Fine-Tuned"),
left_index=True,
right_index=True,
)
print(eval_df)
print("\n\n\nWhen the model should say 'I don't know':")
eval_df = pd.merge(
baseline_idk.rename("Baseline"),
ft_idk.rename("Fine-Tuned"),
left_index=True,
right_index=True,
)
print(eval_df)
results = []
for col in answer_columns:
answer_expected, idk_expected = self.evaluate_model(col)
if scenario == "answer_expected":
results.append(answer_expected)
elif scenario == "idk_expected":
results.append(idk_expected)
else:
raise ValueError("Invalid scenario")
Notice that the fine-tuned model has learnt to say "I don't know" a lot better than the prompt.
Or, the model has gotten good at skipping questions.
Observations
1. The fine-tuned model is better at saying "I don't know"
Correct answers also drop from 83% to 60% with fine-tuning - this is because the fine-tuned
model is more conservative and says "I don't know" more often. This is a good thing because
it's better to say "I don't know" than to give a wrong answer.
That said, we want to improve the correctness of the model, even if that increases the
hallucinations. We're looking for a model that is both correct and conservative, striking a
balance between the two. We'll use Qdrant and Few-Shot Learning to achieve this.
What is next?
6. Fine-Tuning OpenAI Model with Qdrant 6.1 Embed the Fine-Tuning Data 6.2 Embedding
the Questions
8. Evaluation
So far, we've been using the OpenAI model to answer questions without using examples of the
answer. The previous step made it work better on in-context examples, while this one helps it
generalize to unseen data, and attempt to learn when to say "I don't know" and when to give an
answer.
This is where few-shot learning comes in!
Few-shot learning is a type of transfer learning that allows us to answer questions where the
answer is not present in the context. We can do this by providing a few examples of the answer
we're looking for, and the model will learn to answer questions where the answer is not present
in the context.
Embeddings are a way to represent sentences as an array of floats. We'll use the embeddings to
find the most similar questions to the ones we're looking for.
import os
from qdrant_client import QdrantClient
from qdrant_client.http import models
from qdrant_client.http.models import PointStruct
from qdrant_client.http.models import Distance, VectorParams
qdrant_client = QdrantClient(
url=os.getenv("QDRANT_URL"), api_key=os.getenv("QDRANT_API_KEY"), timeout=6000, prefer_grpc=True
)
collection_name = "squadv2-cookbook"
tqdm.pandas()
embedding_model = DefaultEmbedding()
❗️⏰ Important Note: This step can take up to 3 hours to complete. Please be patient. If you
see Out of Memory errors or Kernel Crashes, please reduce the batch size to 32, restart the
kernel and run the notebook again. This code needs to be run only ONCE.
1. Initialization: batch_size = 512 and total_batches set the stage for how many questions
will be processed in one go. This is to prevent memory issues. If your machine can handle
more, feel free to increase the batch size. If your kernel crashes, reduce the batch size to 32
and try again.
2. Progress Bar: tqdm gives you a nice progress bar so you don't fall asleep.
3. Batch Loop: The for-loop iterates through batches. start_idx and end_idx define the
slice of the DataFrame to process.
embeddings.
Returns the list of PointStruct objects, which can be used to create a collection in Qdrant.
pbar.close()
# Create a temporary DataFrame to hold the embeddings and existing DataFrame columns
temp_df = df.copy()
temp_df["embeddings"] = embeddings_list
temp_df["id"] = temp_df.index
return points
points = generate_points_from_dataframe(train_df)
Note that configuring Qdrant is outside the scope of this notebook. Please refer to the Qdrant
for more information. We used a timeout of 600 seconds for the upload, and grpc compression
to speed up the upload.
operation_info = qdrant_client.upsert(
collection_name=collection_name, wait=True, points=points
)
print(operation_info)
Now that we've uploaded the embeddings to Qdrant, we can use Qdrant to find the most
similar questions to the question we're looking for. We'll use the top 5 most similar questions to
create a prompt that we can use to fine-tune the model. We'll then measure the performance of
the fine-tuned model on the same validation set, but with few shot prompting!
Our main function get_few_shot_prompt serves as the workhorse for generating prompts for
few-shot learning. It does this by retrieving similar questions from Qdrant - a vector search
engine, using an embeddings model. Here is the high-level workflow:
1. Retrieve similar questions from Qdrant where the answer is present in the context
2. Retrieve similar questions from Qdrant where the answer is IMPOSSIBLE i.e. the expected
answer is "I don't know" to find in the context
5. Evaluate the fine-tuned model on the validation set with the same prompting technique
def get_few_shot_prompt(row):
embeddings = list(embedding_model.embed([query]))
query_embedding = embeddings[0].tolist()
num_of_qa_to_retrieve = 5
instruction = """Answer the following Question based on the Context only. Only answer from the Co
# If there is a next best question, add it to the prompt
def q_to_prompt(q):
question, context = q.payload["question"], q.payload["context"]
answer = q.payload["answers"][0] if len(q.payload["answers"]) > 0 else "I don't know"
return [
{
"role": "user",
"content": f"""Question: {question}\n\nContext: {context}\n\nAnswer:"""
},
{"role": "assistant", "content": answer},
]
rag_prompt = []
if len(q1) >= 1:
rag_prompt += q_to_prompt(q1[1])
if len(q2) >= 1:
rag_prompt += q_to_prompt(q2[1])
if len(q1) >= 1:
rag_prompt += q_to_prompt(q1[2])
rag_prompt += [
{
"role": "user",
"content": f"""Question: {query}\n\nContext: {row_context}\n\nAnswer:"""
},
]
# ⏰ Time: 2 min
train_sample["few_shot_prompt"] = train_sample.progress_apply(get_few_shot_prompt, axis=1)
fine_tuner = OpenAIFineTuner(
training_file_path="local_cache/100_train_few_shot.jsonl",
model_name="gpt-3.5-turbo",
suffix="trnfewshot20230907"
)
model_id = fine_tuner.fine_tune_model()
model_id
8. Evaluation
But how well does the model perform? Let's compare the results from the 3 different models
we've looked at so far:
evaluator = Evaluator(df)
evaluator.plot_model_comparison(["generated_answer", "ft_generated_answer", "ft_generated_answer_few_
This is quite amazing -- we're able to get the best of both worlds! We're able to get the model
to be both correct and conservative:
1. The model is correct 83% of the time -- this is the same as the base model
2. The model gives the wrong answer only 8% of the time -- down from 17% with the base
model
Next, let's look at the hallucinations. We want to reduce the hallucinations, but not at the cost of
correctness. We want to strike a balance between the two. We've struck a good balance here:
1. The model hallucinates 53% of the time -- down from 100% with the base model
2. The model says "I don't know" 47% of the time -- up from NEVER with the base model
Few Shot Fine-Tuning with Qdrant is a great way to control and steer the performance of your
RAG system. Here, we made the model less conservative compared to zero shot and more
confident by using Qdrant to find similar questions.
You can also use Qdrant to make the model more conservative. We did this by giving examples
of questions where the answer is not present in the context.
This is biasing the model to say "I don't know" more often.
Similarly, one can also use Qdrant to make the model more confident by giving examples of
questions where the answer is present in the context. This biases the model to give an answer
more often. The trade-off is that the model will also hallucinate more often.
You can make this trade off by adjusting the training data: distribution of questions and
examples, as well as the kind and number of examples you retrieve from Qdrant.
9. Conclusion
In this notebook, we've demonstrated how to fine-tune OpenAI models for specific use-cases.
We've also demonstrated how to use Qdrant and Few-Shot Learning to improve the
performance of the model.
Aggregate Results
So far, we've looked at the results for each scenario separately, i.e. each scenario summed to
100. Let's look at the results as an aggregate to get a broader sense of how the model is
performing:
Skipped 0% 18% 5%
Wrong 9% 3% 4%
Observations
Compared to base model
1. The few shot fine-tuned with Qdrant model is as good as the base model at answering
questions where the answer is present in the context.
2. The few shot fine-tuned with Qdrant model is better at saying "I don't know" when the
answer is not present in the context.
3. The few shot fine-tuned with Qdrant model is better at reducing hallucinations.
1. The few shot fine-tuned with Qdrant model gets more correct answers than the fine-tuned
model: 83% of the questions are answered correctly vs 60% for the fine-tuned model
2. The few shot fine-tuned with Qdrant model is better at deciding when to say "I don't know"
when the answer is not present in the context. 34% skip rate for the plain fine-tuning
mode, vs 9% for the few shot fine-tuned with Qdrant model
1. Notice the trade-offs between number of correct answers and hallucinations -- and how
training dataset choice influences that!
2. Fine-tune OpenAI models for specific use-cases and use Qdrant to improve the
performance of your RAG model
Cassandra / Astra DB
Stefano Lottini
Open in Github
Aug 28, 2023
The demos in this directory show how to use the Vector Search capabilities available today in
DataStax Astra DB, a serverless Database-as-a-Service built on Apache Cassandra®.
These example notebooks demonstrate implementation of the same GenAI standard RAG
workload with different libraries and APIs.
To use Astra DB with its HTTP API interface, head to the "AstraPy" notebook ( astrapy is the
Python client to interact with the database).
If you prefer CQL access to the database (either with Astra DB or a Cassandra cluster
supporting vector search), check the "CQL" or "CassIO" notebooks -- they differ in the level of
abstraction you get to work at.
If you want to know more about Astra DB and its Vector Search capabilities, head over to
datastax.com.
Example notebooks
The following examples show how easily OpenAI and DataStax Astra DB can work together to
power vector-based AI applications. You can run them either with your local Jupyter engine or
as Colab notebooks:
Prerequisites
Collecting wget
Downloading wget-3.2.zip (10 kB)
Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: wget
Building wheel for wget (setup.py) ... [?25l- done
[?25h Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9657 sha256=10fd8aa1d20
Stored in directory: /home/trusted-service-user/.cache/pip/wheels/8b/f1/7f/5c94f0a7a505ca1c81
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
Collecting openai
Downloading openai-0.27.6-py3-none-any.whl (71 kB)
[2K [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.9/71.9 kB[0m [31m1.7 MB/s
[?25hRequirement already satisfied: tqdm in /home/trusted-service-user/cluster-env/trident_env
Requirement already satisfied: requests>=2.20 in /home/trusted-service-user/cluster-env/trident
Requirement already satisfied: aiohttp in /home/trusted-service-user/cluster-env/trident_env/li
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /home/trusted-service-user/cluster-env/
Requirement already satisfied: certifi>=2017.4.17 in /home/trusted-service-user/cluster-env/tri
Requirement already satisfied: idna<4,>=2.5 in /home/trusted-service-user/cluster-env/trident_e
Requirement already satisfied: charset-normalizer<4,>=2 in /home/trusted-service-user/cluster-e
Requirement already satisfied: attrs>=17.3.0 in /home/trusted-service-user/cluster-env/trident_
Requirement already satisfied: frozenlist>=1.1.1 in /home/trusted-service-user/cluster-env/trid
Requirement already satisfied: multidict<7.0,>=4.5 in /home/trusted-service-user/cluster-env/tr
Requirement already satisfied: yarl<2.0,>=1.0 in /home/trusted-service-user/cluster-env/trident
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /home/trusted-service-user/cluste
Requirement already satisfied: aiosignal>=1.1.2 in /home/trusted-service-user/cluster-env/tride
Installing collected packages: openai
Successfully installed openai-0.27.6
In this section we are going to load prepared embedding data, so you don't have to recompute
the embeddings of Wikipedia articles with your own credits.
import wget
embeddings_url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
import zipfile
import pandas as pd
article_df = pd.read_csv('/lakehouse/default/Files/data/vector_database_wikipedia_articles_embedded.c
# Read vectors from strings back into a list
article_df["title_vector"] = article_df.title_vector.apply(literal_eval)
article_df["content_vector"] = article_df.content_vector.apply(literal_eval)
article_df.head()
Create a table & load the vectors in Kusto based on the contents in the dataframe. The spark
option CreakeIfNotExists will automatically create a table if it doesn't exist
# replace with your AAD Tenant ID, Kusto Cluster URI, Kusto DB name and Kusto Table
AAD_TENANT_ID = ""
KUSTO_CLUSTER = ""
KUSTO_DATABASE = "Vector"
KUSTO_TABLE = "Wiki"
# Replace the auth method based on your desired authentication mechanism - https://github.com/Azure/
access_token=mssparkutils.credentials.getToken(kustoOptions["kustoCluster"])
The OpenAI API key is used for vectorization of the documents and queries. You can follow the
instructions to create and retrieve your Azure OpenAI key and endpoint.
https://learn.microsoft.com/en-us/azure/cognitive-services/openai/tutorials/embeddings
Please make sure to use the text-embedding-3-small model. Since the precomputed
embeddings were created with text-embedding-3-small model we also have to use it during
search.
import openai
openai.api_version = '2022-12-01'
openai.api_base = '' # Please add your endpoint here
openai.api_type = 'azure'
openai.api_key = '' # Please add your api key here
def embed(query):
# Creates embedding vector from user query
embedded_query = openai.Embedding.create(
input=query,
deployment_id="embed", #replace with your deployment id
chunk_size=1
)["data"][0]["embedding"]
return embedded_query
StatementMeta(, 7e5070d2-4560-4fb8-a3a8-6a594acd58ab, 44, Finished, Available)
If using Open AI
Only run this cell if you plan to use Open AI for embedding
openai.api_key = ""
def embed(query):
# Creates embedding vector from user query
embedded_query = openai.Embedding.create(
input=query,
model="text-embedding-3-small",
)["data"][0]["embedding"]
return embedded_query
KCSB = KustoConnectionStringBuilder.with_aad_device_authentication(
KUSTO_CLUSTER)
KCSB.authority_id = AAD_TENANT_ID
KUSTO_CLIENT = KustoClient(KCSB)
df = dataframe_from_result_table(RESPONSE.primary_results[0])
df
df = dataframe_from_result_table(RESPONSE.primary_results[0])
df
Searching for relevant information can sometimes feel like looking for a needle in a haystack,
but don’t despair, GPTs can actually do a lot of this work for us. In this guide we explore a way
to augment existing search systems with various AI techniques, helping us sift through the
noise.
1. Mimicking Human Browsing: GPT triggers a search, evaluates the results, and modifies the
search query if necessary. It can also follow up on specific search results to form a chain of
thought, much like a human user would do.
2. Retrieval with Embeddings: Calculate embeddings for your content and a user query, and
then retrieve the content most related as measured by cosine similarity. This technique is
used heavily by search engines like Google.
These approaches are both promising, but each has their shortcomings: the first one can be
slow due to its iterative nature and the second one requires embedding your entire knowledge
base in advance, continuously embedding new content and maintaining a vector database.
By combining these approaches, and drawing inspiration from re-ranking methods, we identify
an approach that sits in the middle. This approach can be implemented on top of any existing
search system, like the Slack search API, or an internal ElasticSearch instance with private
data. Here’s how it works:
Step 1: Search
Step 2: Re-rank
1. Embeddings for each result are used to calculate semantic similarity to a generated
hypothetical ideal answer to the user question.
Step 3: Answer
1. Given the top search results, the model generates an answer to the user’s question,
including references and links.
This hybrid approach offers relatively low latency and can be integrated into any existing search
endpoint, without requiring the upkeep of a vector database. Let's dive into it! We will use the
News API as an example domain to search over.
Setup
In addition to your OPENAI_API_KEY , you'll have to include a NEWS_API_KEY in your
environment. You can get an API key here.
%%capture
%env NEWS_API_KEY = YOUR_NEWS_API_KEY
# Dependencies
from datetime import date, timedelta # date handling for fetching recent news
from IPython import display # for pretty printing
import json # for parsing the JSON api responses and model outputs
from numpy import dot # for cosine similarity
from openai import OpenAI
import os # for loading environment variables
import requests # for making the API requests
from tqdm.notebook import tqdm # for printing progress bars
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
GPT_MODEL = "gpt-3.5-turbo"
# Helper functions
def json_gpt(input: str):
completion = client.chat.completions.create(model=GPT_MODEL,
messages=[
{"role": "system", "content": "Output only valid JSON"},
{"role": "user", "content": input},
],
temperature=0.5)
text = completion.choices[0].message.content
parsed = json.loads(text)
return parsed
1. Search
Now, in order to be as exhaustive as possible, we use the model to generate a list of diverse
queries based on this question.
QUERIES_INPUT = f"""
You have access to a search API that returns recent news articles.
Generate an array of search queries that are relevant to this question.
Use a variation of related keywords for the queries, trying to be as general as possible.
Include as many queries as you can think of, including and excluding terms.
For example, include queries like ['keyword_1 keyword_2', 'keyword_1', 'keyword_2'].
Be creative. The more queries you include, the more likely you are to find relevant results.
queries = json_gpt(QUERIES_INPUT)["queries"]
queries
return response.json()
articles = []
# remove duplicates
articles = list({article["url"]: article for article in articles}.values())
Title: Nascar takes on Le Mans as LeBron James gets centenary race under way
Description: <ul><li>Nascar has presence at iconic race for first time since 1976</li><li>NBA s
Content: The crowd chanted U-S-A! U-S-A! as Nascar driver lineup for the 24 Hours of Le Mans pa
Title: NBA finals predictions: Nuggets or Heat? Our writers share their picks
Description: Denver or Miami? Our contributors pick the winner, key players and dark horses bef
Content: The Nuggets are here because
A lot has been made of the importance of a balanced roster with conti...
Title: Unboxing: Michelob ULTRA and Artist Futura Enshrine the NBA Championship In Custom Hand-
Description: As the 2022-2023 NBA Championship nears the end, Michelob ULTRA brings joy to spor
Content: As the 2022-2023 NBA Championship nears the end, Michelob ULTRA brings joy to sports f
Title: Futura and Michelob ULTRA Toast to the NBA Finals With Abstract Artwork Crafted From the
Description: The sun is out to play, and so is Michelob ULTRA. With the 2022-2023 NBA Finals un
Content: The sun is out to play, and so is Michelob ULTRA. With the 2022-2023 NBA Finals underw
Title: Signed and Delivered, Futura and Michelob ULTRA Will Gift Hand-Painted Bottles to This Y
Description: Michelob ULTRA, the MVP of joy and official beer sponsor of the NBA is back to cel
Content: Michelob ULTRA, the MVP of joy and official beer sponsor of the NBA is back to celebra
As we can see, oftentimes, the search queries will return a large number of results, many of
which are not relevant to the original question asked by the user. In order to improve the
quality of the final answer, we use embeddings to re-rank and filter the results.
2. Re-rank
Drawing inspiration from HyDE (Gao et al.), we first generate a hypothetical ideal answer to
rerank our compare our results against. This helps prioritize results that look like good answers,
rather than those similar to our question. Here’s the prompt we use to generate our
hypothetical answer.
HA_INPUT = f"""
Generate a hypothetical answer to the user's question. This answer will be used to rank search result
Pretend you have all the information you need to answer, but don't use any actual facts. Instead, use
like NAME did something, or NAME said something at PLACE.
hypothetical_answer = json_gpt(HA_INPUT)["hypotheticalAnswer"]
hypothetical_answer
'The NBA championship was won by TEAM NAME. The MVP was awarded to PLAYER NAME. The last game w
Now, let's generate embeddings for the search results and the hypothetical answer. We then
calculate the cosine distance between these embeddings, giving us a semantic similarity metric.
Note that we can simply calculate the dot product in lieu of doing a full cosine similarity
calculation since the OpenAI embeddings are returned normalized in our API.
hypothetical_answer_embedding = embeddings(hypothetical_answer)[0]
article_embeddings = embeddings(
[
f"{article['title']} {article['description']} {article['content'][0:100]}"
for article in articles
]
)
cosine_similarities[0:10]
[0.7854456526852069,
0.8086023500072106,
0.8002998147018501,
0.7961229569526956,
0.798354506673743,
0.758216458795653,
0.7753754083127359,
0.7494958338411927,
0.804733946801739,
0.8405965885235218]
Finally, we use these similarity scores to sort and filter the results.
Top 5 articles:
Title: NBA Finals: Denver Nuggets beat Miami Hea, lift thier first-ever NBA title
Description: Denver Nuggets won their maiden NBA Championship trophy defeating Miami Heat 94-89
Content: Denver Nuggets won their maiden NBA Championship trophy defeating Miami Heat 94-89 in
Score: 0.8445817523602124
Title: Denver Nuggets win first NBA championship title in Game 5 victory over Miami Heat
Description: The Denver Nuggets won their first NBA championship Monday night, downing the Miam
Content: The Denver Nuggets won their first NBA championship Monday night, downing the Miami He
Score: 0.8409346078172385
Title: Denver Nuggets Capture Their First NBA Championship Behind Unbreakable Chemistry
Description: After 47 years of waiting, the Denver Nuggets are NBA champions. Led by Nikola Jok
Content: DENVER, CO - JUNE 12: Jamal Murray (27) of the Denver Nuggets celebrates as he leaves
Score: 0.8405965885235218
Title: NBA Finals: Nikola Jokic, Denver Nuggets survive Miami Heat to secure franchise's first
Description: In a rock-fight of a Game 5, the Denver Nuggets reached the NBA mountaintop from t
Content: DENVER, COLORADO - JUNE 12: Jamal Murray #27 of the Denver Nuggets reacts during the f
Score: 0.8389716330890262
Awesome! These results look a lot more relevant to our original query. Now, let's use the top 5
results to generate a final answer.
3. Answer
formatted_top_results = [
{
"title": article["title"],
"description": article["description"],
"url": article["url"],
}
for article, _score in sorted_articles[0:5]
]
ANSWER_INPUT = f"""
Generate an answer to the user's question based on the given search results.
TOP_RESULTS: {formatted_top_results}
USER_QUESTION: {USER_QUESTION}
Include as much information as possible in the answer. Reference the relevant search result urls as m
"""
completion = client.chat.completions.create(
model=GPT_MODEL,
messages=[{"role": "user", "content": ANSWER_INPUT}],
temperature=0.5,
stream=True,
)
text = ""
for chunk in completion:
text += chunk.choices[0].delta.content
display.clear_output(wait=True)
display.display(display.Markdown(text))
<IPython.core.display.Markdown object>
Cookbook About API Docs Contribute
This notebook provides an introduction to using Redis as a vector database with OpenAI
embeddings. Redis is a scalable, real-time database that can be used as a vector database when
using the RediSearch Module. The RediSearch module allows you to index and search for
vectors in Redis. This notebook will show you how to use the RediSearch module to index and
search for vectors created by using the OpenAI API and stored in Redis.
What is Redis?
Most developers from a web services background are probably familiar with Redis. At it's core,
Redis is an open-source key-value store that can be used as a cache, message broker, and
database. Developers choice Redis because it is fast, has a large ecosystem of client libraries,
and has been deployed by major enterprises for years.
In addition to the traditional uses of Redis. Redis also provides Redis Modules which are a way
to extend Redis with new data types and commands. Example modules include RedisJSON,
RedisTimeSeries, RedisBloom and RediSearch.
What is RediSearch?
RediSearch is a Redis module that provides querying, secondary indexing, full-text search and
vector search for Redis. To use RediSearch, you first declare indexes on your Redis data. You can
then use the RediSearch clients to query that data. For more information on the feature set of
RediSearch, see the README or the RediSearch documentation.
Deployment options
There are a number of ways to deploy Redis. For local development, the quickest method is to
use the Redis Stack docker container which we will use here. Redis Stack contains a number of
Redis modules that can be used together to create a fast, multi-model data store and query
engine.
For production use cases, The easiest way to get started is to use the Redis Cloud service. Redis
Cloud is a fully managed Redis service. You can also deploy Redis on your own infrastructure
using Redis Enterprise. Redis Enterprise is a fully managed Redis service that can be deployed in
kubernetes, on-premises or in the cloud.
Additionally, every major cloud provider (AWS Marketplace, Google Marketplace, or Azure
Marketplace) offers Redis Enterprise in a marketplace offering.
Prerequisites
install libraries
Redis-py
===========================================================
Start Redis
To keep this example simple, we will use the Redis Stack docker container which we can start as
follows
$ docker-compose up -d
This also includes the RedisInsight GUI for managing your Redis database which you can view
at http://localhost:8001 once you start the docker container.
You're all set up and ready to go! Next, we import and create our client for communicating with
the Redis database we just created.
Install Requirements
Redis-Py is the python client for communicating with Redis. We will use this to communicate
with our Redis-stack database.
===========================================================
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY by
using following command:
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
import os
import openai
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
Load data
In this section we'll load embedded data that has already been converted into vectors. We'll use
this data to create an index in Redis and then search for similar vectors.
import sys
import numpy as np
import pandas as pd
from typing import List
nbutils.download_wikipedia_data()
data = nbutils.read_wikipedia_data()
data.head()
File Downloaded
Connect to Redis
Now that we have our Redis database running, we can connect to it using the Redis-py client.
We will use the default host and port for the Redis database which is localhost:6379 .
import redis
from redis.commands.search.indexDefinition import (
IndexDefinition,
IndexType
)
from redis.commands.search.query import Query
from redis.commands.search.field import (
TextField,
VectorField
)
REDIS_HOST = "localhost"
REDIS_PORT = 6379
REDIS_PASSWORD = "" # default for passwordless Redis
# Connect to Redis
redis_client = redis.Redis(
host=REDIS_HOST,
port=REDIS_PORT,
password=REDIS_PASSWORD
)
redis_client.ping()
True
The below cells will show how to specify and create a search index in Redis. We will:
1. Set some constants for defining our index like the distance metric and the index name
# Constants
VECTOR_DIM = len(data['title_vector'][0]) # length of the vectors
VECTOR_NUMBER = len(data) # initial number of vectors
INDEX_NAME = "embeddings-index" # name of the search index
PREFIX = "doc" # prefix for the document keys
DISTANCE_METRIC = "COSINE" # distance metric for the vectors (ex. COSINE, IP, L2)
Now that we have a search index, we can load documents into it. We will use the same
documents we used in the previous examples. In Redis, either the HASH or JSON (if using
RedisJSON in addition to RediSearch) data types can be used to store documents. We will use
the HASH data type in this example. The below cells will show how to load documents into the
index.
Now that we have a search index and documents loaded into it, we can run search queries.
Below we will provide a function that will run a search query and return the results. Using this
function we run a few queries that will show how you can utilize Redis as a vector database.
def search_redis(
redis_client: redis.Redis,
user_query: str,
index_name: str = "embeddings-index",
vector_field: str = "title_vector",
return_fields: list = ["title", "url", "text", "vector_score"],
hybrid_fields = "*",
k: int = 20,
print_results: bool = True,
) -> List[dict]:
The previous examples showed how run vector search queries with RediSearch. In this section,
we will show how to combine vector search with other RediSearch fields for hybrid search. In the
below example, we will combine vector search with full text search.
def create_hybrid_field(field_name: str, value: str) -> str:
return f'@{field_name}:"{value}"'
# search the content vector for articles about famous battles in Scottish history and only include re
results = search_redis(redis_client,
"Famous battles in Scottish history",
vector_field="title_vector",
k=5,
hybrid_fields=create_hybrid_field("title", "Scottish")
)
# run a hybrid query for articles about Art in the title vector and only include results with the phr
results = search_redis(redis_client,
"Art",
vector_field="title_vector",
k=5,
hybrid_fields=create_hybrid_field("text", "Leonardo da Vinci")
)
# find specific mention of Leonardo da Vinci in the text that our full-text-search query returned
mention = [sentence for sentence in results[0].text.split("\n") if "Leonardo da Vinci" in sentence][0
mention
'In Europe, after the Middle Ages, there was a "Renaissance" which means "rebirth". People redi
HNSW Index
Up until now, we've been using the FLAT or "brute-force" index to run our queries. Redis also
supports the HNSW index which is a fast, approximate index. The HNSW index is a graph-based
index that uses a hierarchical navigable small world graph to store vectors. The HNSW index is a
good choice for large datasets where you want to run approximate queries.
HNSW will take longer to build and consume more memory for most cases than FLAT but will
be faster to run queries on, especially for large datasets.
The following cells will show how to create an HNSW index and run queries with it using the
same data as before.
import time
# Check if index exists
HNSW_INDEX_NAME = INDEX_NAME+ "_HNSW"
try:
redis_client.ft(HNSW_INDEX_NAME).info()
print("Index already exists")
except:
# Create RediSearch Index
redis_client.ft(HNSW_INDEX_NAME).create_index(
fields = fields,
definition = IndexDefinition(prefix=[PREFIX], index_type=IndexType.HASH)
)
# since RediSearch creates the index in the background for existing documents, we will wait until
# indexing is complete before running our queries. Although this is not necessary for the first query
# some queries may take longer to run if the index is not fully built. In general, Redis will perform
# best when adding new documents to existing indices rather than new indices on existing documents.
while redis_client.ft(HNSW_INDEX_NAME).info()["indexing"] == "1":
time.sleep(5)
results = search_redis(redis_client, 'modern art in Europe', index_name=HNSW_INDEX_NAME, k=10)
# compare the results of the HNSW index to the FLAT index and time both queries
def time_queries(iterations: int = 10):
print(" ----- Flat Index ----- ")
t0 = time.time()
for i in range(iterations):
results_flat = search_redis(redis_client, 'modern art in Europe', k=10, print_results=False)
t0 = (time.time() - t0) / iterations
results_flat = search_redis(redis_client, 'modern art in Europe', k=10, print_results=True)
print(f"Flat index query time: {round(t0, 3)} seconds\n")
time.sleep(1)
print(" ----- HNSW Index ------ ")
t1 = time.time()
for i in range(iterations):
results_hnsw = search_redis(redis_client, 'modern art in Europe', index_name=HNSW_INDEX_NAME,
t1 = (time.time() - t1) / iterations
results_hnsw = search_redis(redis_client, 'modern art in Europe', index_name=HNSW_INDEX_NAME, k=1
print(f"HNSW index query time: {round(t1, 3)} seconds")
print(" ------------------------ ")
time_queries()
This notebook takes you through a simple flow to download some data, embed it, and then
index and search it using a selection of vector databases. This is a common requirement for
customers who want to store and search our embeddings with their own data in a secure
environment to support production use cases such as chatbots, topic modelling and more.
Vector databases enable enterprises to take many of the embeddings use cases we've shared in
this repo (question and answering, chatbot and recommendation services, for example), and
make use of them in a secure, scalable environment. Many of our customers make embeddings
solve their problems at small scale but performance and security hold them back from going
into production - we see vector databases as a key component in solving that, and in this guide
we'll walk through the basics of embedding text data, storing it in a vector database and using it
for semantic search.
Demo Flow
The demo flow is:
Setup: Import packages and set any required variables
Qdrant
Setup: Here we'll set up the Python client for Qdrant. For more details go here
Index Data: We'll create a collection with vectors for titles and content
Once you've run through this notebook you should have a basic understanding of how to setup
and use vector databases, and can move on to more complex use cases making use of our
embeddings.
Setup
Import the required libraries and set the embedding model that we'd like to use.
Collecting qdrant-client
...
Successfully installed certifi-2023.5.7 grpcio-1.56.0 grpcio-tools-1.56.0 h11-0.14.0 h2-4.1.0 h
Collecting wget
Using cached wget-3.2.zip (10 kB)
Preparing metadata (setup.py) ... [?25ldone
[?25hBuilding wheels for collected packages: wget
Building wheel for wget (setup.py) ... [?25ldone
[?25h Created wheel for wget: filename=wget-3.2-py3-none-any.whl size=9657 sha256=eb5f15f1215
Stored in directory: /home/user/.cache/pip/wheels/04/5f/3e/46cc37c5d698415694d83f607f833f83f0
Successfully built wget
Installing collected packages: wget
Successfully installed wget-3.2
import openai
# I've set this to our new embeddings model, this can be changed to the embedding model of your choic
EMBEDDING_MODEL = "text-embedding-3-small"
# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings
Load data
In this section we'll load embedded data that we've prepared previous to this session.
embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
'vector_database_wikipedia_articles_embedded.zip'
import zipfile
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip","r") as zip_ref:
zip_ref.extractall("../data")
article_df = pd.read_csv('../data/vector_database_wikipedia_articles_embedded.csv')
article_df.head()
the year in
the J...
article_df.info(show_counts=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 25000 non-null int64
1 url 25000 non-null object
2 title 25000 non-null object
3 text 25000 non-null object
4 title_vector 25000 non-null object
5 content_vector 25000 non-null object
6 vector_id 25000 non-null object
dtypes: int64(1), object(6)
memory usage: 1.3+ MB
Qdrant
Qdrant. is a high-performant vector search database written in Rust. It offers both on-premise
and cloud version, but for the purposes of that example we're going to use the local
deployment mode.
Setup
For the local deployment, we are going to use Docker, according to the Qdrant documentation:
https://qdrant.tech/documentation/quick_start/. Qdrant requires just a single container, but
an example of the docker-compose.yaml file is available at ./qdrant/docker-compose.yaml in
this repo.
You can start Qdrant instance locally by navigating to this directory and running docker-
compose up -d
qdrant.get_collections()
CollectionsResponse(collections=[CollectionDescription(name='Routines')])
Index data
Qdrant stores data in collections where each object is described by at least one vector and may
contain an additional metadata called payload. Our collection will be called Articles and each
object will be described by both title and content vectors.
We'll be using an official qdrant-client package that has all the utility methods already built-in.
from qdrant_client.http import models as rest
vector_size = len(article_df['content_vector'][0])
qdrant.recreate_collection(
collection_name='Articles',
vectors_config={
'title': rest.VectorParams(
distance=rest.Distance.COSINE,
size=vector_size,
),
'content': rest.VectorParams(
distance=rest.Distance.COSINE,
size=vector_size,
),
}
)
True
qdrant.upsert(
collection_name='Articles',
points=[
rest.PointStruct(
id=k,
vector={
'title': v['title_vector'],
'content': v['content_vector'],
},
payload=v.to_dict(),
)
for k, v in article_df.iterrows()
],
)
# Check the collection size to make sure all the points have been stored
qdrant.count(collection_name='Articles')
CountResult(count=25000)
Search Data
Once the data is put into Qdrant we will start querying the collection for the closest vectors. We
may provide an additional parameter vector_name to switch from title to content based search.
query_results = qdrant.search(
collection_name=collection_name,
query_vector=(
vector_name, embedded_query
),
limit=top_k,
)
return query_results
Note: To answer questions based on text documents, we recommend the procedure in Question
Answering using Embeddings. Some of the code below may rely on deprecated API endpoints.
We will add hard adversarial examples as well, which will be based either on semantically similar
sections, or neighbouring sections, originating from the same article.
import openai
import pandas as pd
df = pd.read_csv('olympics-data/olympics_qa.csv')
olympics_search_fileid = "file-c3shd8wqF3vSCKaukW4Jr1TT"
df.head()
0 2020 Summary The 2020 713 2020 Summer 1. What 1. The 2020
Summer Summer Olympics\nSummary\n\nThe is the Summer Olympics
Olympics Olympics 2020 Summ... 2020 is an
(Japanese: Summer internationa...
2020年夏季オリ Olympics?
ン...
title heading content tokens context questions answers
\n2. When
...
2020 Host city The 126 2020 Summer 1. \n2. 1. What is the
Summer selection International Olympics\nHost city \n3. \n4. International
1 Olympics Olympic selection\n\nT... Olympic
Committee Committee...
(IOC) vote...
(3014, 754)
we check that the separator we intend to use isn't present within the contexts
df.context.str.contains('->').sum()
The fine-tuning dataset is created in the following way. For every corresponding question,
answer and context pair we create:
random negative example, where the random context is paired with the question
This process is noisy, as sometimes the question might be answerable given a different context,
but on average we hope this won't affect the performance too much.
We apply the same process of dataset creation for both the discriminator, and the Q&A
answering model. We apply the process separately for the training and testing set, to ensure
that the examples from the training set don't feature within the test set.
import random
Parameters
----------
df: pd.DataFrame
The dataframe containing the question, answer and context pairs
discriminator: bool
Whether to create a dataset for the discriminator
n_negative: int
The number of random negative samples to add (using a random context)
add_related: bool
Whether to add the related contexts to the correct context. These are hard negative examples
Returns
-------
pd.DataFrame
The dataframe containing the prompts and completions, ready for fine-tuning
"""
rows = []
for i, row in df.iterrows():
for q, a in zip(("1." + row.questions).split('\n'), ("1." + row.answers).split('\n')):
if len(q) >10 and len(a) >10:
if discriminator:
rows.append({"prompt":f"{row.context}\nQuestion: {q[2:].strip()}\n Related:", "co
else:
rows.append({"prompt":f"{row.context}\nQuestion: {q[2:].strip()}\nAnswer:", "comp
if len(subset) < 1:
continue
random_context = subset.sample(1).iloc[0].context
if j == 1 and add_related:
# add the related contexts based on the most similar contexts according to th
random_context = get_random_similar_contexts(q[2:].strip(), row.context, sear
else:
while True:
# add random context, which isn't the correct context
random_context = df.sample(1).iloc[0].context
if random_context != row.context:
break
if discriminator:
rows.append({"prompt":f"{random_context}\nQuestion: {q[2:].strip()}\n Related
else:
rows.append({"prompt":f"{random_context}\nQuestion: {q[2:].strip()}\nAnswer:"
return pd.DataFrame(rows)
We apply the same process of dataset creation for both the discriminator, and the Q&A
answering model. We apply the process separately for the training and testing set, to ensure
that the examples from the training set don't feature within the test set.
We formatted the data according to the recommendations from the fine-tuning tool, which is
available using
We highly recommend that you use this tool, which suggests improvements in your data
formatting for fine-tuning.
We will now use the fine-tuned discriminator and the fine-tuned Q&A model. By requesting
logprobs, we can see how certain the discriminator is in a yes vs no answer.
ft_discriminator = "curie:ft-openai-internal-2021-08-23-23-58-57"
ft_qa = "curie:ft-openai-internal-2021-08-23-17-54-10"
apply_ft_discriminator('The first human-made object in space was the Soviet Union satellite Sputnik 1
'What was the first human-made object in space?', ft_discriminator)
We can see that the model can generalize well to different contexts and questions.
apply_ft_qa_answer('The first human-made object in space was the Soviet Union satellite Sputnik 1 on
'What was the first human-made object in space?', ft_qa)
' The first human-made object in space was the Soviet Union satellite Sputnik 1 on 4 October 19
We can see that the model can answer the question, when the context is appropriate.
apply_ft_qa_answer('The first human-made object in space was the Soviet Union satellite Sputnik 1 on
'What is impressive about the Soviet Union?', ft_qa)
' The Soviet Union was the first country to successfully launch a satellite into space'
apply_ft_qa_answer('The first human-made object in space was the Soviet Union satellite Sputnik 1 on
'How many cars were produced in the Soviet Union in 1970?', ft_qa)
We can see that the model knows when to answer the question, and when to say that
insufficient context is present to answer the question.
We can also combine a discriminator and a base model, or a fine-tuned Q&A model.
Discriminator can essentially serve as a decision whether the question can be answered given
the context or not.
The above function illustrates how to potentially combine a discriminator and a fine-tuned Q&A
model. This gives a more fine-grained control over how certain we want the model to be before
it answers the question.
We'll now take a look on how answers endpoint works - combining search to retrieve the
relevant context from a knowledge base, and then using the fine-tuned Q&A model to answer
the question.
3.4 Answering the question based on a knowledge base
Finally we can use a logic similar to the /answers endpoint, where we first search for the
relevant context, and then ask a Q&A model to answer the question given that context. If you'd
like to see the implementation details, check out the answers_with_ft.py file.
" Canada won the Women's football tournament at the 2020 Olympic games"
Cookbook About API Docs Contribute
This notebook takes you through a simple flow to download some data, embed it, and then
index and search it using a selection of vector databases. This is a common requirement for
customers who want to store and search our embeddings with their own data in a secure
environment to support production use cases such as chatbots, topic modelling and more.
Vector databases enable enterprises to take many of the embeddings use cases we've shared in
this repo (question and answering, chatbot and recommendation services, for example), and
make use of them in a secure, scalable environment. Many of our customers make embeddings
solve their problems at small scale but performance and security hold them back from going
into production - we see vector databases as a key component in solving that, and in this guide
we'll walk through the basics of embedding text data, storing it in a vector database and using it
for semantic search.
Demo Flow
The demo flow is:
Setup: Import packages and set any required variables
Chroma:
Setup: Here we'll set up the Python client for Chroma. For more details go here
Index Data: We'll create collections with vectors for titles and content
Once you've run through this notebook you should have a basic understanding of how to setup
and use vector databases, and can move on to more complex use cases making use of our
embeddings.
Setup
Import the required libraries and set the embedding model that we'd like to use.
Collecting openai
Obtaining dependency information for openai from https://files.pythonhosted.org/packages/67/7
Downloading openai-0.27.8-py3-none-any.whl.metadata (13 kB)
Collecting requests>=2.20 (from openai)
Obtaining dependency information for requests>=2.20 from https://files.pythonhosted.org/packa
Using cached requests-2.31.0-py3-none-any.whl.metadata (4.6 kB)
Collecting tqdm (from openai)
Using cached tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting aiohttp (from openai)
Obtaining dependency information for aiohttp from https://files.pythonhosted.org/packages/fa/
Downloading aiohttp-3.8.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (7.7 kB)
Collecting charset-normalizer<4,>=2 (from requests>=2.20->openai)
Obtaining dependency information for charset-normalizer<4,>=2 from https://files.pythonhosted
Using cached charset_normalizer-3.2.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (31 kB)
Collecting idna<4,>=2.5 (from requests>=2.20->openai)
Using cached idna-3.4-py3-none-any.whl (61 kB)
Collecting urllib3<3,>=1.21.1 (from requests>=2.20->openai)
Obtaining dependency information for urllib3<3,>=1.21.1 from https://files.pythonhosted.org/p
Downloading urllib3-2.0.4-py3-none-any.whl.metadata (6.6 kB)
Collecting certifi>=2017.4.17 (from requests>=2.20->openai)
Using cached certifi-2023.5.7-py3-none-any.whl (156 kB)
Collecting attrs>=17.3.0 (from aiohttp->openai)
Using cached attrs-23.1.0-py3-none-any.whl (61 kB)
Collecting multidict<7.0,>=4.5 (from aiohttp->openai)
Using cached multidict-6.0.4-cp310-cp310-macosx_11_0_arm64.whl (29 kB)
Collecting async-timeout<5.0,>=4.0.0a3 (from aiohttp->openai)
Using cached async_timeout-4.0.2-py3-none-any.whl (5.8 kB)
Collecting yarl<2.0,>=1.0 (from aiohttp->openai)
import openai
import pandas as pd
import os
import wget
from ast import literal_eval
# I've set this to our new embeddings model, this can be changed to the embedding model of your choic
EMBEDDING_MODEL = "text-embedding-3-small"
# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings
Load data
In this section we'll load embedded data that we've prepared previous to this session.
embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
'vector_database_wikipedia_articles_embedded.zip'
import zipfile
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip","r") as zip_ref:
zip_ref.extractall("../data")
article_df = pd.read_csv('../data/vector_database_wikipedia_articles_embedded.csv')
article_df.head()
article_df.info(show_counts=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 25000 non-null int64
1 url 25000 non-null object
2 title 25000 non-null object
3 text 25000 non-null object
4 title_vector 25000 non-null object
5 content_vector 25000 non-null object
6 vector_id 25000 non-null object
dtypes: int64(1), object(6)
memory usage: 1.3+ MB
Chroma
We'll index these embedded documents in a vector database and search them. The first option
we'll look at is Chroma, an easy to use open-source self-hosted in-memory vector database,
designed for working with embeddings together with LLMs.
Create the Chroma client. By default, Chroma is ephemeral and runs in memory. However, you
can easily set up a persistent configuration which writes to disk.
Create collections
Chroma collections allow you to store and filter with arbitrary metadata, making it easy to query
subsets of the embedded data.
Chroma is already integrated with OpenAI's embedding functions. The best way to use them is
on construction of a collection, as follows. Alternatively, you can 'bring your own embeddings'.
More information can be found here
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
OPENAI_API_KEY is ready
For this use-case, we'll just store the embeddings and IDs, and use these to index the original
dataframe.
return df
title_query_result = query_collection(
collection=wikipedia_title_collection,
query="modern art in Europe",
max_results=10,
dataframe=article_df
)
title_query_result.head()
11777 15436 0.271688 Hellenistic art The art of the Hellenistic time (from 400 B.C....
12178 23265 0.279306 Byzantine art Byzantine art is a form of Christian Greek art...
13215 11777 0.294415 Art film Art films are a type of movie that is very dif...
15436 22108 0.305937 Renaissance art Many of the most famous and best-loved works o...
content_query_result = query_collection(
collection=wikipedia_content_collection,
query="Famous battles in Scottish history",
max_results=10,
dataframe=article_df
)
content_query_result.head()
id score title content
6248 2923 0.294823 841 \n\nEvents \n June 25: Battle of Fontenay – Lo...
11702 11708 0.307572 William Wallace William Wallace was a Scottish knight who foug...
Now that you've got a basic embeddings search running, you can hop over to the Chroma docs
to learn more about how to add filters to your query, update/delete data in your collections,
and deploy Chroma.
Cookbook About API Docs Contribute
We will upload the data to Weights & Biases and use an Embedding Projector to visualize the
embeddings using common dimension reduction algorithms like PCA, UMAP, and t-SNE. The
dataset is created in the Get_embeddings_from_dataset Notebook.
Weights & Biases is a machine learning platform used by OpenAI and other ML teams to build
better models faster. They use it to quickly track experiments, evaluate model performance,
reproduce models, visualize results, and share findings with colleagues.
We create a W&B Table with the original data and the embeddings. Each review is a new row
and the 1536 embedding floats are given their own column named emb_{i} .
import pandas as pd
from sklearn.manifold import TSNE
import numpy as np
from ast import literal_eval
import wandb
original_cols = df.columns[1:-1].tolist()
embedding_cols = ['emb_'+str(idx) for idx in range(len(matrix[0]))]
table_cols = original_cols + embedding_cols
with wandb.init(project='openai_embeddings'):
table = wandb.Table(columns=table_cols)
for i, row in enumerate(df.to_dict(orient="records")):
original_data = [row[col_name] for col_name in original_cols]
embedding_data = matrix[i].tolist()
table.add_data(*(original_data + embedding_data))
wandb.log({'openai_embedding_table': table})
2. Render as 2D Projection
After navigating to the W&B run link, we click the ⚙️ icon in the top right of the Table and
change "Render As:" to "Combined 2D Projection".
Example: http://wandb.me/openai_embeddings
Cookbook About API Docs Contribute
This notebook presents how to implement a Question Answering system with Langchain,
AnalyticDB as a knowledge based and OpenAI embeddings. If you are not familiar with
AnalyticDB, it’s better to check out the Getting_started_with_AnalyticDB_and_OpenAI.ipynb
notebook.
Using AnalyticDB to perform the nearest neighbour search in the created collection to find
some context.
All the steps will be simplified to calling some corresponding Langchain methods.
Prerequisites
For the purposes of this exercise we need to prepare a couple of things: AnalyticDB cloud
instance. Langchain as a framework. An OpenAI API key.
Install requirements
This notebook requires the following Python packages: openai , tiktoken , langchain and
psycopg2cffi .
openai provides convenient access to the OpenAI API.
psycopg2cffi library is used to interact with the vector database, but any other
PostgreSQL client library is also acceptable.
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
import os
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
OPENAI_API_KEY is ready
The OpenAI API key is used for vectorization of the documents and queries.
If you don't have an OpenAI API key, you can get one from
[https://platform.openai.com/account/api-keys ).
Once you get your key, please add it to your environment variables as OPENAI_API_KEY by
running following command:
import os
from langchain.vectorstores.analyticdb import AnalyticDB
CONNECTION_STRING = AnalyticDB.connection_string_from_db_params(
driver=os.environ.get("PG_DRIVER", "psycopg2cffi"),
host=os.environ.get("PG_HOST", "localhost"),
port=int(os.environ.get("PG_PORT", "5432")),
database=os.environ.get("PG_DATABASE", "postgres"),
user=os.environ.get("PG_USER", "postgres"),
password=os.environ.get("PG_PASSWORD", "postgres"),
)
import json
Load data
In this section we are going to load the data containing some natural questions and answers to
them. All the data will be used to create a Langchain application with AnalyticDB being the
knowledge base.
print(questions[0])
print(answers[0])
No . overall No. in season Title Directed by Written by Original air date U.S. viewers ( millio
Chain definition
Langchain is already integrated with AnalyticDB and performs all the indexing for given list of
documents. In our case we are going to store the set of answers we have.
embeddings = OpenAIEmbeddings()
doc_store = AnalyticDB.from_texts(
texts=answers, embedding=embeddings, connection_string=CONNECTION_STRING,
pre_delete_collection=True,
)
At this stage all the possible answers are already stored in AnalyticDB, so we can define the
whole QA chain.
llm = OpenAI()
qa = VectorDBQA.from_chain_type(
llm=llm,
chain_type="stuff",
vectorstore=doc_store,
return_source_documents=False,
)
Search data
Once the data is put into AnalyticDB we can start asking some questions. A question will be
automatically vectorized by OpenAI model, and the created vector will be used to find some
possibly matching answers in AnalyticDB. Once retrieved, the most similar answers will be
incorporated into the prompt sent to OpenAI Large Language Model.
import random
random.seed(52)
selected_questions = random.choices(questions, k=5)
Use the following pieces of context to answer the question at the end. If you don't know the answer
{context}
Question: {question}
Helpful Answer:
We can, however, provide our prompt template and change the behaviour of the OpenAI LLM,
while still using the stuff chain type. It is important to keep {context} and {question} as
placeholders.
2. Suggests a random song title if it doesn't know the answer to our question.
custom_prompt_template = PromptTemplate(
template=custom_prompt, input_variables=["context", "question"]
)
custom_qa = VectorDBQA.from_chain_type(
llm=llm,
chain_type="stuff",
vectorstore=doc_store,
return_source_documents=False,
chain_type_kwargs={"prompt": custom_prompt_template},
)
random.seed(41)
for question in random.choices(questions, k=5):
print(">", question)
print(custom_qa.run(question), end="\n\n")
> what was uncle jesse's original last name on full house
Uncle Jesse's original last name on Full House was Cochran.
This notebook gives an example on how to get embeddings from a large dataset.
The dataset used in this example is fine-food reviews from Amazon. The dataset contains a
total of 568,454 food reviews Amazon users left up to October 2012. We will use a subset of this
dataset, consisting of 1,000 most recent reviews for illustration purposes. The reviews are in
English and tend to be positive or negative. Each review has a ProductId, UserId, Score, review
title (Summary) and review body (Text).
We will combine the review summary and review text into a single combined text. The model
will encode this combined text and it will output a single vector embedding.
To run this notebook, you will need to install: pandas, openai, transformers, plotly, matplotlib,
scikit-learn, torch (transformer dep), torchvision, and scipy.
import pandas as pd
import tiktoken
embedding_model = "text-embedding-3-small"
embedding_encoding = "cl100k_base"
max_tokens = 8000 # the maximum for text-embedding-3-small is 8191
1351123200 B003XPF9BO A3R7JR3FMEBXQB 5 where does one Wanted to save Title: where does
start...and some to bring one start...and
0
stop... with a to my Chicago stop... wit...
tre... fam...
# subsample to 1k most recent reviews and remove samples that are too long
top_n = 1000
df = df.sort_values("Time").tail(top_n * 2) # first cut to first 2k entries, assuming less than half
df.drop("Time", axis=1, inplace=True)
encoding = tiktoken.get_encoding(embedding_encoding)
1000
# Ensure you have your API key set in your environment per the README: https://github.com/openai/open
a = get_embedding("hi", model=embedding_model)
Cookbook About API Docs Contribute
This notebook provides a step-by-step guide on using MyScale as a vector database for OpenAI
embeddings. The process includes:
4. Leveraging MyScale to perform nearest neighbor search within the created collection.
What is MyScale
MyScale is a database built on Clickhouse that combines vector search and SQL analytics to
offer a high-performance, streamlined, and fully managed experience. It's designed to facilitate
joint queries and analyses on both structured and vector data, with comprehensive SQL support
for all data processing.
Deployment options
Deploy and execute vector search with SQL on your cluster within two minutes by using
MyScale Console.
Prerequisites
Install requirements
This notebook requires the openai , clickhouse-connect , as well as some other dependencies.
Use the following command to install them:
import openai
Connect to MyScale
Follow the connections details section to retrieve the cluster host, username, and password
information from the MyScale console, and use it to create a connection to your cluster as
shown below:
import clickhouse_connect
# initialize client
client = clickhouse_connect.get_client(host='YOUR_CLUSTER_HOST', port=8443, username='YOUR_USERNAME',
Load data
We need to load the dataset of precomputed vector embeddings for Wikipedia articles
provided by OpenAI. Use the wget package to download the dataset.
import wget
embeddings_url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
After the download is complete, extract the file using the zipfile package:
import zipfile
import pandas as pd
Index data
We will create an SQL table called articles in MyScale to store the embeddings data. The
table will include a vector index with a cosine distance metric and a constraint for the length of
the embeddings. Use the following code to create and insert data into the articles table:
client.command(f"""
CREATE TABLE IF NOT EXISTS default.articles
(
id UInt64,
url String,
title String,
text String,
content_vector Array(Float32),
CONSTRAINT cons_vector_len CHECK length(content_vector) = {embedding_len},
VECTOR INDEX article_content_index content_vector TYPE HNSWFLAT('metric_type=Cosine')
)
ENGINE = MergeTree ORDER BY id
""")
batch_size = 100
total_records = len(article_df)
We need to check the build status of the vector index before proceeding with the search, as it is
automatically built in the background.
# check the status of the vector index, make sure vector index is ready with 'Built' status
get_index_status="SELECT status FROM system.vector_indices WHERE name='article_content_index'"
print(f"index build status: {client.command(get_index_status)}")
Search data
Once indexed in MyScale, we can perform vector search to find similar content. First, we will use
the OpenAI API to generate embeddings for our query. Then, we will perform the vector search
using MyScale.
import openai
# query the database to find the top K similar content to the given query
top_k = 10
results = client.query(f"""
SELECT id, url, title, distance(content_vector, {embed}) as dist
FROM default.articles
ORDER BY dist
LIMIT {top_k}
""")
# display results
for i, r in enumerate(results.named_results()):
print(i+1, r['title'])
1 Battle of Bannockburn
2 Wars of Scottish Independence
3 1651
4 First War of Scottish Independence
5 Robert I of Scotland
6 841
7 1716
8 1314
9 1263
10 William Wallace
Cookbook About API Docs Contribute
This notebook provides step by step instuctions on using Azure AI Search (f.k.a Azure Cognitive
Search) as a vector database with OpenAI embeddings. Azure AI Search is a cloud search service
that gives developers infrastructure, APIs, and tools for building a rich search experience over
private, heterogeneous content in web, mobile, and enterprise applications.
Prerequistites:
For the purposes of this exercise you must have the following:
import json
import wget
import pandas as pd
import zipfile
from openai import AzureOpenAI
from azure.identity import DefaultAzureCredential, get_bearer_token_provider
from azure.core.credentials import AzureKeyCredential
from azure.search.documents import SearchClient, SearchIndexingBufferedSender
from azure.search.documents.indexes import SearchIndexClient
from azure.search.documents.models import (
QueryAnswerType,
QueryCaptionType,
QueryType,
VectorizedQuery,
)
from azure.search.documents.indexes.models import (
HnswAlgorithmConfiguration,
HnswParameters,
SearchField,
SearchableField,
SearchFieldDataType,
SearchIndex,
SemanticConfiguration,
SemanticField,
SemanticPrioritizedFields,
SemanticSearch,
SimpleField,
VectorSearch,
VectorSearchAlgorithmKind,
VectorSearchAlgorithmMetric,
VectorSearchProfile,
)
This section guides you through setting up authentication for Azure OpenAI, allowing you to
securely interact with the service using either Azure Active Directory (AAD) or an API key. Before
proceeding, ensure you have your Azure OpenAI endpoint and credentials ready. For detailed
instructions on setting up AAD with Azure OpenAI, refer to the official documentation.
# Set this flag to True if you are using Azure Active Directory
use_aad_for_aoai = True
if use_aad_for_aoai:
# Use Azure Active Directory (AAD) authentication
client = AzureOpenAI(
azure_endpoint=endpoint,
api_version=api_version,
azure_ad_token_provider=token_provider,
)
else:
# Use API key authentication
client = AzureOpenAI(
api_key=api_key,
api_version=api_version,
azure_endpoint=endpoint,
)
This section explains how to set up the Azure AI Search client for integrating with the Vector
Store feature. You can locate your Azure AI Search service details in the Azure Portal or
programmatically via the Search Management SDK.
# Configuration
search_service_endpoint: str = "YOUR_AZURE_SEARCH_ENDPOINT"
search_service_api_key: str = "YOUR_AZURE_SEARCH_ADMIN_KEY"
index_name: str = "azure-ai-search-openai-cookbook-demo"
# Set this flag to True if you are using Azure Active Directory
use_aad_for_search = True
if use_aad_for_search:
# Use Azure Active Directory (AAD) authentication
credential = DefaultAzureCredential()
else:
# Use API key authentication
credential = AzureKeyCredential(search_service_api_key)
Load data
embeddings_url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
'vector_database_wikipedia_articles_embedded.zip'
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip", "r") as zip_ref:
zip_ref.extractall("../../data")
article_df = pd.read_csv("../../data/vector_database_wikipedia_articles_embedded.csv")
Create an index
This code snippet demonstrates how to define and create a search index using the
SearchIndexClient from the Azure AI Search Python SDK. The index incorporates both vector
search and semantic ranker capabilities. For more details, visit our documentation on how to
Create a Vector Index
# Initialize the SearchIndexClient
index_client = SearchIndexClient(
endpoint=search_service_endpoint, credential=credential
)
# Create the search index with the vector search and semantic search configurations
index = SearchIndex(
name=index_name,
fields=fields,
vector_search=vector_search,
semantic_search=semantic_search,
)
azure-ai-search-openai-cookbook-demo created
The following code snippet outlines the process of uploading a batch of documents—
specifically, Wikipedia articles with pre-computed embeddings—from a pandas DataFrame to
an Azure AI Search index. For a detailed guide on data import strategies and best practices,
refer to Data Import in Azure AI Search.
# Convert the 'id' and 'vector_id' columns to string so one of them can serve as our key field
article_df["id"] = article_df["id"].astype(str)
article_df["vector_id"] = article_df["vector_id"].astype(str)
# Convert the DataFrame to a list of dictionaries
documents = article_df.to_dict(orient="records")
# Create a SearchIndexingBufferedSender
batch_client = SearchIndexingBufferedSender(
search_service_endpoint, index_name, credential
)
try:
# Add upload actions for all documents in a single call
batch_client.upload_documents(documents=documents)
If your dataset didn't already contain pre-computed embeddings, you can create embeddings
by using the below function using the openai python library. You'll also notice the same
function and model are being used to generate query embeddings for performing vector
searches.
first_document_content = documents[0]["text"]
print(f"Content: {first_document_content[:100]}")
Content: April is the fourth month of the year in the Julian and Gregorian calendars, and comes
Content vector generated
results = search_client.search(
search_text=None,
vector_queries= [vector_query],
select=["title", "text", "url"]
)
Title: Documenta
Score: 0.8599451
URL: https://simple.wikipedia.org/wiki/Documenta
Title: Expressionism
Score: 0.852354
URL: https://simple.wikipedia.org/wiki/Expressionism
Hybrid search combines the capabilities of traditional keyword-based search with vector-based
similarity search to provide more relevant and contextual results. This approach is particularly
useful when dealing with complex queries that benefit from understanding the semantic
meaning behind the text.
The provided code snippet demonstrates how to execute a hybrid search query:
# Hybrid Search
query = "Famous battles in Scottish history"
results = search_client.search(
search_text=query,
vector_queries= [vector_query],
select=["title", "text", "url"],
top=3
)
Title: Scottish
Score: 0.016393441706895828
URL: https://simple.wikipedia.org/wiki/Scottish
results = search_client.search(
search_text=query,
vector_queries=[vector_query],
select=["title", "text", "url"],
query_type=QueryType.SEMANTIC,
semantic_configuration_name="my-semantic-config",
query_caption=QueryCaptionType.EXTRACTIVE,
query_answer=QueryAnswerType.EXTRACTIVE,
top=3,
)
semantic_answers = results.get_answers()
for answer in semantic_answers:
if answer.highlights:
print(f"Semantic Answer: {answer.highlights}")
else:
print(f"Semantic Answer: {answer.text}")
print(f"Semantic Answer Score: {answer.score}\n")
Semantic Answer: Advancements During the industrial revolution, new technology brought many ch
Semantic Answer Score: 0.90478515625
Title: Printing
Reranker Score: 1.603400707244873
URL: https://simple.wikipedia.org/wiki/Printing
Caption: Machines to speed printing, cheaper paper, automatic stitching and binding all arrived
Title: Industrialisation
Reranker Score: 1.3238357305526733
URL: https://simple.wikipedia.org/wiki/Industrialisation
Caption: <em>Industrialisation</em> (or<em> industrialization)</em> is a process that happens i
Cookbook About API Docs Contribute
GPT-3 can help us extract key figures, dates or other bits of important content from documents
that are too big to fit into the context window. One approach for solving this is to chunk the
document up and process each chunk separately, before combining into one list of answers.
Chunk up our document and process each chunk to pull any answers out
This simple approach will then be extended to three more difficult questions
Approach
Setup: Take a PDF, a Formula 1 Financial Regulation document on Power Units, and extract
the text from it for entity extraction. We'll use this to try to extract answers that are buried
in the content.
Simple Entity Extraction: Extract key bits of information from chunks of a document by:
Creating a template prompt with our questions and an example of the format it
expects
Create a function to take a chunk of text as input, combine with the prompt and get a
response
Run a script to chunk the text, extract answers and output them for parsing
Complex Entity Extraction: Ask some more difficult questions which require tougher
reasoning to work out
Setup
import textract
import os
import openai
import tiktoken
# Example prompt -
document = '<document>'
template_prompt=f'''Extract key pieces of information from this regulation document.
If a particular piece of information is not present, output \"Not specified\".
When you extract a key piece of information, include the closest page number.
Use the following format:\n0. Who is the author\n1. What is the amount of the "Power Unit Cost Cap" i
print(template_prompt)
Document: """<document>"""
def extract_chunk(document,template_prompt):
prompt = template_prompt.replace('<document>',document)
messages = [
{"role": "system", "content": "You help extract information from documents."},
{"role": "user", "content": prompt}
]
response = client.chat.completions.create(
model='gpt-4',
messages=messages,
temperature=0,
max_tokens=1500,
top_p=1,
frequency_penalty=0,
presence_penalty=0
)
return "1." + response.choices[0].message.content
# Initialise tokenizer
tokenizer = tiktoken.get_encoding("cl100k_base")
results = []
chunks = create_chunks(clean_text,1000,tokenizer)
text_chunks = [tokenizer.decode(chunk) for chunk in chunks]
['1. What is the amount of the "Power Unit Cost Cap" in USD, GBP and EUR: USD 95,000,000 (Page
'2. What is the value of External Manufacturing Costs in USD: US Dollars 20,000,000 in respect
'3. What is the Capital Expenditure Limit in USD: US Dollars 30,000,000 (Page 32)']
# Example prompt -
template_prompt=f'''Extract key pieces of information from this regulation document.
If a particular piece of information is not present, output \"Not specified\".
When you extract a key piece of information, include the closest page number.
Use the following format:\n0. Who is the author\n1. How is a Minor Overspend Breach calculated\n2. Ho
print(template_prompt)
Document: """<document>"""
results = []
['1. How is a Minor Overspend Breach calculated: A Minor Overspend Breach arises when a Power U
'2. How is a Major Overspend Breach calculated: A Material Overspend Breach arises when a Powe
'3. Which years do these financial regulations apply to: 2026 onwards (Page 1)',
'3. Which years do these financial regulations apply to: 2023, 2024, 2025, 2026 and subsequent
'3. Which years do these financial regulations apply to: 2022-2025 (Page 6)',
'3. Which years do these financial regulations apply to: 2023, 2024, 2025, 2026 and subsequent
'3. Which years do these financial regulations apply to: 2022 (Page 14)',
'3. Which years do these financial regulations apply to: 2022 (Page 16)',
'3. Which years do these financial regulations apply to: 2022 (Page 19)',
'3. Which years do these financial regulations apply to: 2022 (Page 21)',
'3. Which years do these financial regulations apply to: 2026 onwards (Page 26)',
'3. Which years do these financial regulations apply to: 2026 (Page 2)',
'3. Which years do these financial regulations apply to: 2022 (Page 30)',
'3. Which years do these financial regulations apply to: 2022 (Page 32)',
'3. Which years do these financial regulations apply to: 2023, 2024 and 2025 (Page 1)',
'3. Which years do these financial regulations apply to: 2022 (Page 37)',
'3. Which years do these financial regulations apply to: 2026 onwards (Page 40)',
'3. Which years do these financial regulations apply to: 2022 (Page 1)',
'3. Which years do these financial regulations apply to: 2026 to 2030 seasons (Page 46)',
'3. Which years do these financial regulations apply to: 2022 (Page 47)',
'3. Which years do these financial regulations apply to: 2022 (Page 1)',
'3. Which years do these financial regulations apply to: 2022 (Page 1)',
'3. Which years do these financial regulations apply to: 2022 (Page 56)',
'3. Which years do these financial regulations apply to: 2022 (Page 1)',
'3. Which years do these financial regulations apply to: 2022 (Page 16)',
'3. Which years do these financial regulations apply to: 2022 (Page 16)']
Consolidation
We've been able to extract the first two answers safely, while the third was confounded by the
date that appeared on every page, though the correct answer is in there as well.
If you have sufficient training data, fine-tuning a model to find a set of outputs very well
The way you chunk your data - we have gone for 1000 tokens with no overlap, but more
intelligent chunking that breaks info into sections, cuts by tokens or similar may get better
results
However, with minimal tuning we have now answered 6 questions of varying difficulty using the
contents of a long document, and have a reusable approach that we can apply to any long
document requiring entity extraction. Look forward to seeing what you can do with this!
Cookbook About API Docs Contribute
Weaviate is an open-source vector search engine (docs - Github) that can store and search
through OpenAI embeddings and data objects. The database allows you to do similarity search,
hybrid search (the combining of multiple search techniques, such as keyword-based and vector
search), and generative search (like Q&A). Weaviate also supports a wide variety of OpenAI-
based modules (e.g., text2vec-openai , qna-openai ), allowing you to vectorize and query data
fast and efficiently.
You can run Weaviate (including the OpenAI modules if desired) in three ways:
Examples
Google
Name Description language Colab
Getting Started with A simple getting started for semantic vector search Python link
Weaviate and OpenAI using the OpenAI vectorization module in Weaviate Notebook
( text2vec-openai )
Hybrid Search with A simple getting started for hybrid search using the Python link
Weaviate and OpenAI OpenAI vectorization module in Weaviate ( text2vec- Notebook
openai )
Google
Name Description language Colab
Question Answering A simple getting started for question answering (Q&A) Python link
with Weaviate and using the OpenAI Q&A module in Weaviate ( qna- Notebook
OpenAI openai )
Using embeddings
Boris Power, Ted Sanders, Logan Kilpatrick
Open in Github
Mar 9, 2022
This notebook contains some helpful snippets you can use to embed text with the text-
embedding-3-small model via the OpenAI API.
import openai
embedding = openai.Embedding.create(
input="Your text goes here", model="text-embedding-3-small"
)["data"][0]["embedding"]
len(embedding)
1536
It's recommended to use the 'tenacity' package or another exponential backoff implementation
to better manage API rate limits, as hitting the API too much too fast can trigger rate limits.
Using the following function ensures you get your embeddings as fast as possible.
# Best practice
import openai
from tenacity import retry, wait_random_exponential, stop_after_attempt
# Retry up to 6 times with exponential backoff, starting at 1 second and maxing out at 20 seconds del
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
def get_embedding(text: str, model="text-embedding-3-small") -> list[float]:
return openai.Embedding.create(input=[text], model=model)["data"][0]["embedding"]
1536
Cookbook About API Docs Contribute
Typesense
Jason Bosco
Open in Github
Apr 12, 2023
Typesense is an open source, in-memory search engine, that you can either self-host or run on
Typesense Cloud.
Why Typesense?
Typesense focuses on performance by storing the entire index in RAM (with a backup on disk)
and also focuses on providing an out-of-the-box developer experience by simplifying available
options and setting good defaults.
It also lets you combine attribute-based filtering together with vector queries, to fetch the most
relevant documents.
Other features
Besides vector storage and search, Typesense also offers the following features:
Sorting: Dynamically sort results based on a particular field at query time (helpful for
features like "Sort by Price (asc)").
Grouping & Distinct: Group similar results together to show more variety.
Federated Search: Search across multiple collections (indices) in a single HTTP request.
Scoped API Keys: Generate API keys that only allow access to certain records, for multi-
tenant applications.
Synonyms: Define words as equivalents of each other, so searching for a word will also
return results for the synonyms defined.
Curation & Merchandizing: Boost particular records to a fixed position in the search results,
to feature them.
No Runtime Dependencies: Typesense is a single binary that you can run locally or in
production with a single command.
How To
To learn more about how to use Typesense with OpenAI embeddings, see the notebook
here for an example:
examples/vector_databases/Using_vector_databases_for_embeddings_search.ipynb
To learn more about Typesense's vector search feature, read the docs here:
https://typesense.org/docs/0.24.1/api/vector-search.html.
Cookbook About API Docs Contribute
This example will cover chat completions using the Azure OpenAI service. It also includes
information on content filtering.
Setup
First, we install the necessary dependencies and import the libraries we will be using.
import os
import openai
import dotenv
dotenv.load_dotenv()
Authentication
The Azure OpenAI service supports multiple authentication mechanisms that include API keys
and Azure Active Directory token credentials.
use_azure_active_directory = False # Set this flag to True if you are using Azure Active Directory
To set up the OpenAI SDK to use an Azure API Key, we need to set api_key to a key associated
with your endpoint (you can find this key in "Keys and Endpoints" under "Resource Management"
in the Azure Portal). You'll also find the endpoint for your resource here.
if not use_azure_active_directory:
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_API_KEY"]
client = openai.AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2023-09-01-preview"
)
Let's now see how we can autheticate via Azure Active Directory. We'll start by installing the
azure-identity library. This library will provide the token credentials we need to authenticate
and help us build a token credential provider through the get_bearer_token_provider helper
function. It's recommended to use get_bearer_token_provider over providing a static token to
AzureOpenAI because this API will automatically cache and refresh tokens for you.
For more information on how to set up Azure Active Directory authentication with Azure
OpenAI, see the documentation.
if use_azure_active_directory:
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
client = openai.AzureOpenAI(
azure_endpoint=endpoint,
azure_ad_token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiv
api_version="2023-09-01-preview"
)
“Note: the AzureOpenAI infers the following arguments from their corresponding
environment variables if they are not provided:”
Deployments
In this section we are going to create a deployment of a GPT model that we can use to create
chat completions.
Let's deploy a model to use with chat completions. Go to https://portal.azure.com, find your
Azure OpenAI resource, and then navigate to the Azure OpenAI Studio. Click on the
"Deployments" tab and then create a deployment for the model you want to use for chat
completions. The deployment name that you give the model will be used in the code below.
deployment = "" # Fill in the deployment name from the portal here
print(f"{response.choices[0].message.role}: {response.choices[0].message.content}")
if delta.role:
print(delta.role + ": ", end="", flush=True)
if delta.content:
print(delta.content, end="", flush=True)
Content filtering
Azure OpenAI service includes content filtering of prompts and completion responses. You can
learn more about content filtering and how to configure it here.
If the prompt is flagged by the content filter, the library will raise a BadRequestError exception
with a content_filter error code. Otherwise, you can access the prompt_filter_results and
content_filter_results on the response to see the results of the content filtering and what
import json
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "<text violating the content policy>"}
]
try:
completion = client.chat.completions.create(
messages=messages,
model=deployment,
)
except openai.BadRequestError as e:
err = json.loads(e.response.text)
if err["error"]["code"] == "content_filter":
print("Content filter triggered!")
content_filter_result = err["error"]["innererror"]["content_filter_result"]
for category, details in content_filter_result.items():
print(f"{category}:\n filtered={details['filtered']}\n severity={details['severity']}")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What's the biggest city in Washington?"}
]
completion = client.chat.completions.create(
messages=messages,
model=deployment,
)
print(f"Answer: {completion.choices[0].message.content}")
Note: To answer questions based on text documents, we recommend the procedure in Question
Answering using Embeddings. Some of the code below may rely on deprecated API endpoints.
To create a model which answers questions only if there is sufficient context for doing so, we
first create a dataset of questions and answers based on paragraphs of text. In order to train the
model to answer only when the answer is present, we also add adversarial examples, where the
question doesn't match the context. In those cases, we ask the model to output "No sufficient
context for answering the question".
1. The first (this) notebook focuses on collecting recent data, which GPT-3 didn't see during
it's pre-training. We picked the topic of Olympic Games 2020 (which actually took place in
the summer of 2021), and downloaded 713 unique pages. We organized the dataset by
individual sections, which will serve as context for asking and answering the questions.
2. The second notebook will utilize Davinci-instruct to ask a few questions based on a
Wikipedia section, as well as answer those questions, based on that section.
3. The third notebook will utilize the dataset of context, question and answer pairs to
additionally create adversarial questions and context pairs, where the question was not
generated on that context. In those cases the model will be prompted to answer "No
sufficient context for answering the question". We will also train a discriminator model,
which predicts whether the question can be answered based on the context or not.
Extracting the data will take about half an hour, and processing will likely take about as much.
import pandas as pd
import wikipedia
def filter_olympic_2020_titles(titles):
"""
Get the titles which are related to Olympic games hosted in 2020, given a list of titles
"""
titles = [title for title in titles if '2020' in title and 'olympi' in title.lower()]
return titles
def get_wiki_page(title):
"""
Get the wikipedia page given a title
"""
try:
return wikipedia.page(title)
except wikipedia.exceptions.DisambiguationError as e:
return wikipedia.page(e.options[0])
except wikipedia.exceptions.PageError as e:
return None
909
We remove sections unlikely to contain textual information, and ensure that each section is not
longer than the token limit
import re
from typing import Set
from transformers import GPT2TokenizerFast
import numpy as np
from nltk.tokenize import sent_tokenize
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
def reduce_long(
long_text: str, long_text_tokens: bool = False, max_len: int = 590
) -> str:
"""
Reduce a long text to a maximum of `max_len` tokens by potentially cutting at a sentence end
"""
if not long_text_tokens:
long_text_tokens = count_tokens(long_text)
if long_text_tokens > max_len:
sentences = sent_tokenize(long_text.replace("\n", " "))
ntokens = 0
for i, sentence in enumerate(sentences):
ntokens += 1 + count_tokens(sentence)
if ntokens > max_len:
return ". ".join(sentences[:i]) + "."
return long_text
cont = contents.pop(0).strip()
outputs = [(title, "Summary", cont, count_tokens(cont)+4)]
return outputs
# Example section
ber[-1]
1.2.1 We create a dataset and filter out any sections with fewer than 40
tokens, as those are unlikely to contain enough context to ask a good
question.
res = []
for page in pages:
res += extract_sections(page.content, page.title)
df = pd.DataFrame(res, columns=["title", "heading", "content", "tokens"])
df = df[df.tokens>40]
df = df.drop_duplicates(['title','heading'])
df = df.reset_index().drop('index',axis=1) # reset index
df.head()
Token indices sequence length is longer than the specified maximum sequence length for this mod
2020 Summer Summary The 2020 Summer Olympics (Japanese: 2020 713
0
Olympics 年夏季オリン...
2020 Summer Host city selection The International Olympic Committee 126
1
Olympics (IOC) vote...
2020 Summer Impact of the COVID-19 pandemic In January 2020, concerns were raised 369
2
Olympics about th...
2020 Summer Qualifying event cancellation and Concerns about the pandemic began to 298
3
Olympics postponement affect qu...
title heading content tokens
2020 Summer Effect on doping tests Mandatory doping tests were being 163
4
Olympics severely res...
df.to_csv('olympics-data/olympics_sections.csv', index=False)
df.title.value_counts().head()
There appear to be winter and summer Olympics 2020. We chose to leave a little ambiguity and
noise in the dataset, even though we were interested in only Summer Olympics 2020.
df.title.str.contains('Summer').value_counts()
True 3567
False 305
Name: title, dtype: int64
df.title.str.contains('Winter').value_counts()
False 3774
True 98
Name: title, dtype: int64
import pandas as pd
from matplotlib import pyplot as plt
df = pd.read_csv('olympics-data/olympics_sections.csv')
df[['tokens']].hist()
# add axis descriptions and title
plt.xlabel('Number of tokens')
plt.ylabel('Number of Wikipedia sections')
plt.title('Distribution of number of tokens in Wikipedia sections')
plt.show()
We can see that the majority of section are fairly short (less than 500 tokens).
Cookbook About API Docs Contribute
This notebook provides a step-by-step guide for our new gpt-3.5-turbo fine-tuning. We'll
perform entity extraction using the RecipeNLG dataset, which provides various recipes and a list
of extracted generic ingredients for each. This is a common dataset for named entity
recognition (NER) tasks.
1. Setup: Loading our dataset and filtering down to one domain to fine-tune on.
2. Data preparation: Preparing your data for fine-tuning by creating training and validation
examples, and uploading them to the Files endpoint.
By the end of this you should be able to train, evaluate and deploy a fine-tuned gpt-3.5-turbo
model.
For more information on fine-tuning, you can refer to our documentation guide, API reference
or blog post
Setup
# make sure to use the latest version of the openai python package
!pip install --upgrade openai
import json
import openai
import os
import pandas as pd
from pprint import pprint
Fine-tuning works best when focused on a particular domain. It's important to make sure your
dataset is both focused enough for the model to learn, but general enough that unseen
examples won't be missed. Having this in mind, we have extracted a subset from the
RecipesNLG dataset to only contain documents from www.cookbooks.com.
recipe_df.head()
Creamy Corn ["2 (16 oz.) ["In a slow www.cookbooks.com/Recipe- www.cookbooks.com ["frozen
pkg. frozen cooker, Details.aspx?id=10570 corn",
corn", "1 (8 combine all "cream
2
oz.) pkg... ingredients. cheese",
C... "butter",
"gar...
Data preparation
We'll begin by preparing our data. When fine-tuning with the ChatCompletion format, each
training example is a simple list of messages . For example, an entry could look like:
[{'role': 'system',
'content': 'You are a helpful recipe assistant. You are to extract the generic ingredients from e
{'role': 'user',
'content': 'Title: No-Bake Nut Cookies\n\nIngredients: ["1 c. firmly packed brown sugar", "1/2 c
{'role': 'assistant',
'content': '["brown sugar", "milk", "vanilla", "nuts", "butter", "bite size shredded rice biscuit
During the training process this conversation will be split, with the final entry being the
completion that the model will produce, and the remainder of the messages acting as the
prompt. Consider this when building your training examples - if your model will act on multi-
turn conversations, then please provide representative examples so it doesn't perform poorly
when the conversation starts to expand.
Please note that currently there is a 4096 token limit for each training example. Anything longer
than this will be truncated at 4096 tokens.
training_data = []
system_message = "You are a helpful recipe assistant. You are to extract the generic ingredients from
def create_user_message(row):
return f"""Title: {row['title']}\n\nIngredients: {row['ingredients']}\n\nGeneric ingredients: """
def prepare_example_conversation(row):
messages = []
messages.append({"role": "system", "content": system_message})
user_message = create_user_message(row)
messages.append({"role": "user", "content": user_message})
pprint(prepare_example_conversation(recipe_df.iloc[0]))
{'messages': [{'content': 'You are a helpful recipe assistant. You are to '
'extract the generic ingredients from each of the '
'recipes provided.',
'role': 'system'},
{'content': 'Title: No-Bake Nut Cookies\n'
'\n'
'Ingredients: ["1 c. firmly packed brown sugar", '
'"1/2 c. evaporated milk", "1/2 tsp. vanilla", "1/2 '
'c. broken nuts (pecans)", "2 Tbsp. butter or '
'margarine", "3 1/2 c. bite size shredded rice '
'biscuits"]\n'
'\n'
'Generic ingredients: ',
'role': 'user'},
{'content': '["brown sugar", "milk", "vanilla", "nuts", '
'"butter", "bite size shredded rice biscuits"]',
'role': 'assistant'}]}
Let's now do this for a subset of the dataset to use as our training data. You can begin with even
30-50 well-pruned examples. You should see performance continue to scale linearly as you
increase the size of the training set, but your jobs will also take longer.
{'messages': [{'role': 'system', 'content': 'You are a helpful recipe assistant. You are to ext
{'messages': [{'role': 'system', 'content': 'You are a helpful recipe assistant. You are to ext
{'messages': [{'role': 'system', 'content': 'You are a helpful recipe assistant. You are to ext
{'messages': [{'role': 'system', 'content': 'You are a helpful recipe assistant. You are to ext
{'messages': [{'role': 'system', 'content': 'You are a helpful recipe assistant. You are to ext
In addition to training data, we can also optionally provide validation data, which will be used
to make sure that the model does not overfit your training set.
validation_df = recipe_df.loc[101:200]
validation_data = validation_df.apply(prepare_example_conversation, axis=1).tolist()
We then need to save our data as .jsonl files, with each line being one training example
conversation.
def write_jsonl(data_list: list, filename: str) -> None:
with open(filename, "w") as out:
for ddict in data_list:
jout = json.dumps(ddict) + "\n"
out.write(jout)
training_file_name = "tmp_recipe_finetune_training.jsonl"
write_jsonl(training_data, training_file_name)
validation_file_name = "tmp_recipe_finetune_validation.jsonl"
write_jsonl(validation_data, validation_file_name)
This is what the first 5 lines of our training .jsonl file look like:
{"messages": [{"role": "system", "content": "You are a helpful recipe assistant. You are to ext
{"messages": [{"role": "system", "content": "You are a helpful recipe assistant. You are to ext
{"messages": [{"role": "system", "content": "You are a helpful recipe assistant. You are to ext
{"messages": [{"role": "system", "content": "You are a helpful recipe assistant. You are to ext
{"messages": [{"role": "system", "content": "You are a helpful recipe assistant. You are to ext
Upload files
You can now upload the files to our Files endpoint to be used by the fine-tuned model.
training_file_id = training_response.id
Fine-tuning
Now we can create our fine-tuning job with the generated files and an optional suffix to identify
the model. The response will contain an id which you can use to retrieve updates on the job.
Note: The files have to first be processed by our system, so you might get a File not ready
error. In that case, simply retry a few minutes later.
response = client.fine_tuning.jobs.create(
training_file=training_file_id,
validation_file=validation_file_id,
model="gpt-3.5-turbo",
suffix="recipe-ner",
)
job_id = response.id
Once it is completed, you can use the result_files to sample the results from the validation
set (if you uploaded one), and use the ID from the fine_tuned_model parameter to invoke your
trained model.
response = client.fine_tuning.jobs.retrieve(job_id)
We can track the progress of the fine-tune with the events endpoint. You can rerun the cell
below a few times until the fine-tune is ready.
response = client.fine_tuning.jobs.list_events(job_id)
events = response.data
events.reverse()
Now that it's done, we can get a fine-tuned model ID from the job:
response = client.fine_tuning.jobs.retrieve(job_id)
fine_tuned_model_id = response.fine_tuned_model
if fine_tuned_model_id is None:
raise RuntimeError("Fine-tuned model ID not found. Your job has likely not been completed yet.")
print("Fine-tuned model ID:", fine_tuned_model_id)
Inference
The last step is to use your fine-tuned model for inference. Similar to the classic FineTuning ,
you simply call ChatCompletions with your new fine-tuned model name filling the model
parameter.
test_df = recipe_df.loc[201:300]
test_row = test_df.iloc[0]
test_messages = []
test_messages.append({"role": "system", "content": system_message})
user_message = create_user_message(test_row)
test_messages.append({"role": "user", "content": create_user_message(test_row)})
pprint(test_messages)
[{'content': 'You are a helpful recipe assistant. You are to extract the '
'generic ingredients from each of the recipes provided.',
'role': 'system'},
{'content': 'Title: Beef Brisket\n'
'\n'
'Ingredients: ["4 lb. beef brisket", "1 c. catsup", "1 c. water", '
'"1/2 onion, minced", "2 Tbsp. cider vinegar", "1 Tbsp. prepared '
'horseradish", "1 Tbsp. prepared mustard", "1 tsp. salt", "1/2 '
'tsp. pepper"]\n'
'\n'
'Generic ingredients: ',
'role': 'user'}]
response = client.chat.completions.create(
model=fine_tuned_model_id, messages=test_messages, temperature=0, max_tokens=500
)
print(response.choices[0].message.content)
["beef brisket", "catsup", "water", "onion", "cider vinegar", "horseradish", "mustard", "salt",
Conclusion
Congratulations, you are now ready to fine-tune your own models using the ChatCompletion
format! We look forward to seeing what you build
Cookbook About API Docs Contribute
We calculate user and product embeddings based on the training set, and evaluate the results
on the unseen test set. We will evaluate the results by plotting the user and product similarity
versus the review score. The dataset is created in the Get_embeddings_from_dataset
Notebook.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from ast import literal_eval
df['babbage_similarity'] = df["embedding"].apply(literal_eval).apply(np.array)
X_train, X_test, y_train, y_test = train_test_split(df, df.Score, test_size = 0.2, random_state=42)
user_embeddings = X_train.groupby('UserId').babbage_similarity.apply(np.mean)
prod_embeddings = X_train.groupby('ProductId').babbage_similarity.apply(np.mean)
len(user_embeddings), len(prod_embeddings)
(577, 706)
We can see that most of the users and products appear within the 50k examples only once.
To evaluate the recommendations, we look at the similarity of the user and product embeddings
amongst the reviews in the unseen test set. We calculate the cosine distance between the user
and product embeddings, which gives us a similarity score between 0 and 1. We then normalize
the scores to be evenly split between 0 and 1, by calculating the percentile of the similarity
score amongst all predicted scores.
We group the cosine similarity scores by the review score, and plot the distribution of cosine
similarity scores for each review score.
Correlation between user & vector similarity percentile metric and review number of stars (scor
We can observe a weak trend, showing that the higher the similarity score between the user and
the product embedding, the higher the review score. Therefore, the user and product
embeddings can weakly predict the review score - even before the user receives the product!
Because this signal works in a different way than the more commonly used collaborative
filtering, it can act as an additional feature to slightly improve the performance on existing
problems.
Cookbook About API Docs Contribute
Vector Databases
Colin Jarvis, Moiz Sajid
Open in Github
Jun 27, 2023
This section of the OpenAI Cookbook showcases many of the vector databases available to
support your semantic search use cases.
Vector databases can be a great accompaniment for knowledge retrieval applications, which
reduce hallucinations by providing the LLM with the relevant context to answer questions.
Each provider has their own named directory, with a standard notebook to introduce you to
using our API with their product, and any supplementary notebooks they choose to add to
showcase their functionality.
AnalyticDB
Cassandra/Astra DB
Azure AI Search
Chroma
Elasticsearch
Hologres
Kusto
Milvus
MyScale
MongoDB
Neon Postgres
Pinecone
PolarDB
Qdrant
Redis
SingleStoreDB
Supabase
Tembo
Typesense
Weaviate
Zilliz
Cookbook About API Docs Contribute
This notebook demonstrates how to use GPT's visual capabilities with a video. GPT-4 doesn't
take videos as input directly, but we can use vision and the new 128K context window to
describe the static frames of a whole video at once. We'll walk through two examples:
2. Generating a voiceover for a video with GPT-4 and the TTS API
import cv2 # We're using OpenCV to read video, to install !pip install opencv-python
import base64
import time
from openai import OpenAI
import os
import requests
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
First, we use OpenCV to extract frames from a nature video containing bisons and wolves:
video = cv2.VideoCapture("data/bison.mp4")
base64Frames = []
while video.isOpened():
success, frame = video.read()
if not success:
break
_, buffer = cv2.imencode(".jpg", frame)
base64Frames.append(base64.b64encode(buffer).decode("utf-8"))
video.release()
print(len(base64Frames), "frames read.")
Once we have the video frames, we craft our prompt and send a request to GPT (Note that we
don't need to send every frame for GPT to understand what's going on):
PROMPT_MESSAGES = [
{
"role": "user",
"content": [
"These are frames from a video that I want to upload. Generate a compelling description t
*map(lambda x: {"image": x, "resize": 768}, base64Frames[0::50]),
],
},
]
params = {
"model": "gpt-4-vision-preview",
"messages": PROMPT_MESSAGES,
"max_tokens": 200,
}
result = client.chat.completions.create(**params)
print(result.choices[0].message.content)
" 🐺 Survival of the Fittest: An Epic Tale in the Snow ❄️ - Witness the intense drama of nature
Remember to respect wildlife and nature. This video may contain scenes that some viewers might
Let's create a voiceover for this video in the style of David Attenborough. Using the same video
frames we prompt GPT to give us a short script:
PROMPT_MESSAGES = [
{
"role": "user",
"content": [
"These are frames of a video. Create a short voiceover script in the style of David Atten
*map(lambda x: {"image": x, "resize": 768}, base64Frames[0::60]),
],
},
]
params = {
"model": "gpt-4-vision-preview",
"messages": PROMPT_MESSAGES,
"max_tokens": 500,
}
result = client.chat.completions.create(**params)
print(result.choices[0].message.content)
In the vast, white expanse of the northern wilderness, a drama as old as time unfolds. Here, am
As tension crackles in the frozen air, the wolves close in, their eyes locked on their target.
In an instant, the quiet of the icy landscape is shattered. The bison charges, a desperate bid
It's an epic battle, a testament to the harsh realities of nature. In these moments, there is n
With the setting sun casting long shadows over the snow, the outcome is inevitable. Nature, in
Now we can pass the script to the TTS API where it will generate an mp3 of the voiceover:
response = requests.post(
"https://api.openai.com/v1/audio/speech",
headers={
"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}",
},
json={
"model": "tts-1-1106",
"input": result.choices[0].message.content,
"voice": "onyx",
},
)
audio = b""
for chunk in response.iter_content(chunk_size=1024 * 1024):
audio += chunk
Audio(audio)
0:00 / 1:50
Cookbook About API Docs Contribute
This notebook guides you step-by-step through answering questions about a collection of data,
using Chroma, an open-source embeddings database, along with OpenAI's text embeddings
and chat completion API's.
Additionally, this notebook demonstrates some of the tradeoffs in making a question answering
system more robust. As we shall see, simple querying doesn't always create the best results!
Large language models (LLMs) like OpenAI's ChatGPT can be used to answer questions about
data that the model may not have been trained on, or have access to. For example;
In order to overcome this limitation, we can use a data store which is amenable to querying in
natural language, just like the LLM itself. An embeddings store like Chroma represents
documents as embeddings, alongside the documents themselves.
By embedding a text query, Chroma can find relevant documents, which we can then pass to
the LLM to answer our question. We'll show detailed examples and variants of this approach.
Note: you may need to restart the kernel to use updated packages.
We use OpenAI's API's throughout this notebook. You can get an API key from
https://beta.openai.com/account/api-keys
You can add your API key as an environment variable by executing the command export
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx in a terminal. Note that you will
need to reload the notebook if the environment variable wasn't set yet. Alternatively, you can
set it in the notebook, see below.
import os
# Uncomment the following line to set the environment variable in the notebook
# os.environ["OPENAI_API_KEY"] = 'sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
OPENAI_API_KEY is ready
Dataset
Throughout this notebook, we use the SciFact dataset. This is a curated dataset of expert
annotated scientific claims, with an accompanying text corpus of paper titles and abstracts. Each
claim may be supported, contradicted, or not have enough evidence either way, according to
the documents in the corpus.
Having the corpus available as ground-truth allows us to investigate how well the following
approaches to LLM question answering perform.
data_path = '../../data'
3 1,000 genomes project enables mapping {'14717500': [{'sentences': [2, 5], [14717500]
1
of genet... 'label': '...
We construct an appropriate prompt, with some example facts, then query the model with each
claim in the dataset. We ask the model to assess a claim as 'True', 'False', or 'NEE' if there is not
enough evidence one way or the other.
def build_prompt(claim):
return [
{"role": "system", "content": "I will ask you to assess a scientific claim. Output only the t
{"role": "user", "content": f"""
Example:
Claim:
0-dimensional biomaterials show inductive properties.
Assessment:
False
Claim:
1/2000 in UK have abnormal PrP positivity.
Assessment:
True
Claim:
Aspirin inhibits the production of PGE2.
Assessment:
False
Claim:
{claim}
Assessment:
"""}
]
def assess_claims(claims):
responses = []
# Query the OpenAI API
for claim in claims:
response = openai.ChatCompletion.create(
model='gpt-3.5-turbo',
messages=build_prompt(claim),
max_tokens=3,
)
# Strip any punctuation or whitespace from the response
responses.append(response.choices[0].message.content.strip('., '))
return responses
claims = samples['claim'].tolist()
We evaluate the ground-truth according to the dataset. From the dataset description, each
claim is either supported or contradicted by the evidence, or else there isn't enough evidence
either way.
def get_groundtruth(evidence):
groundtruth = []
for e in evidence:
# Evidence is empty
if len(e) == 0:
groundtruth.append('NEE')
else:
# In this dataset, all evidence for a given claim is consistent, either SUPPORT or CONTRA
if list(e.values())[0][0]['label'] == 'SUPPORT':
groundtruth.append('True')
else:
groundtruth.append('False')
return groundtruth
evidence = samples['evidence'].tolist()
groundtruth = get_groundtruth(evidence)
We also output the confusion matrix, comparing the model's assessments with the ground
truth, in an easy to read table.
return confusion
We ask the model to directly assess the claims, without additional context.
gpt_inferred = assess_claims(claims)
confusion_matrix(gpt_inferred, groundtruth)
Groundtruth
True False NEE
True 15 5 14
False 0 2 1
NEE 3 3 7
Results
From these results we see that the LLM is strongly biased to assess claims as true, even when
they are false, and also tends to assess false claims as not having enough evidence. Note that
'not enough evidence' is with respect to the model's assessment of the claim in a vacuum,
without additional context.
Adding context
We now add the additional context available from the corpus of paper titles and abstracts. This
section shows how to load a text corpus into Chroma, using OpenAI text embeddings.
7912 BC1 RNA, the transcript from a master [ID elements are short interspersed False
2
gene for... elements (...
18670 The DNA Methylome of Human Peripheral [DNA methylation plays an important role False
3
Blood Mo... in bi...
19238 The human myelin basic protein gene is [Two human Golli (for gene expressed in False
4
include... the ol...
The next step is to load the corpus into Chroma. Given an embedding function, Chroma will
automatically handle embedding each document, and will store it alongside its text and
metadata, making it simple to query.
We instantiate a (ephemeral) Chroma client, and create a collection for the SciFact title and
abstract corpus. Chroma can also be instantiated in a persisted configuration; learn more at the
Chroma docs.
import chromadb
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
Next we load the corpus into Chroma. Because this data loading is memory intensive, we
recommend using a batched loading scheme in batches of 50-1000. For this example it should
take just over one minute for the entire corpus. It's being embedded in the background,
automatically, using the embedding_function we specified earlier.
batch_size = 100
Retrieving context
Next we retrieve documents from the corpus which may be relevant to each claim in our
sample. We want to provide these as context to the LLM for evaluating the claims. We retrieve
the 3 most relevant documents for each claim, according to the embedding distance.
We create a new prompt, this time taking into account the additional context we retrieve from
the corpus.
{' '.join(context)}
Assess the following claim on the basis of the evidence. Output only the text 'True' if the claim is
Claim:
{claim}
Assessment:
"""}]
return responses
Then ask the model to evaluate the claims with the retrieved context.
Groundtruth
True False NEE
True 16 2 8
False 1 6 5
NEE 1 2 9
Results
We see that the model is a lot less likely to evaluate a False claim as true (2 instances VS 5
previously), but that claims without enough evidence are still often assessed as True or False.
Taking a look at the retrieved documents, we see that they are sometimes not relevant to the
claim - this causes the model to be confused by the extra information, and it may decide that
sufficient evidence is present, even when the information is irrelevant. This happens because we
always ask for the 3 'most' relevant documents, but these might not be relevant at all beyond a
certain point.
Along with the documents themselves, Chroma returns a distance score. We can try
thresholding on distance, so that fewer irrelevant documents make it into the context we
provide the model.
If, after filtering on the threshold, no context documents remain, we bypass the model and
simply return that there is not enough evidence.
filtered_claim_query_result = filter_query_result(claim_query_result)
Groundtruth
True False NEE
True 10 2 1
False 0 2 1
NEE 8 6 20
Results
The model now assesses many fewer claims as True or False when there is not enough evidence
present. However, it now biases away from certainty. Most claims are now assessed as having
not enough evidence, because a large fraction of them are filtered out by the distance
threshold. It's possible to tune the distance threshold to find the optimal operating point, but
this can be difficult, and is dataset and embedding model dependent.
Until now, we have queried the dataset using claims which are single sentence statements, while
the corpus contains abstracts describing a scientific paper. Intuitively, while these might be
related, there are significant differences in their structure and meaning. These differences are
encoded by the embedding model, and so influence the distances between the query and the
most relevant results.
We can overcome this by leveraging the power of LLMs to generate relevant text. While the
facts might be hallucinated, the content and structure of the documents the models generate is
more similar to the documents in our corpus, than the queries are. This could lead to better
queries and hence better results.
This approach is called Hypothetical Document Embeddings (HyDE), and has been shown to
be quite good at the retrieval task. It should help us bring more relevant information into the
context, without polluting it.
TL;DR:
you get much better matches when you embed whole abstracts rather than single
sentences
So HyDE shows that using GPT3 to expand claims into hallucinated abstracts and then
searching based on those abstracts works (claims -> abstracts -> results) better than
searching directly (claims -> results)
First, we use in-context examples to prompt the model to generate documents similar to what's
in the corpus, for each claim we want to assess.
def build_hallucination_prompt(claim):
return [{'role': 'system', 'content': """I will ask you to write an abstract for a scientific pap
An Example:
Claim:
A high microerythrocyte count raises vulnerability to severe anemia in homozygous alpha (+)- thal
Abstract:
BACKGROUND The heritable haemoglobinopathy alpha(+)-thalassaemia is caused by the reduced synthes
METHODS AND FINDINGS Data from children living on the north coast of Papua New Guinea who had par
CONCLUSIONS The increased erythrocyte count and microcytosis in children homozygous for alpha(+)-
End of example.
Claim:
{claim}
Abstract:
"""}]
def hallucinate_evidence(claims):
# Query the OpenAI API
responses = []
# Query the OpenAI API
for claim in claims:
response = openai.ChatCompletion.create(
model='gpt-3.5-turbo',
messages=build_hallucination_prompt(claim),
)
responses.append(response.choices[0].message.content)
return responses
NB: This can take a while, about 30m for 100 claims. You can reduce the number of claims we
want to assess to get results more quickly.
hallucinated_evidence = hallucinate_evidence(claims)
We use the hallucinated documents as queries into the corpus, and filter the results using the
same distance threshold.
We then ask the model to assess the claims, using the new context.
Groundtruth
True False NEE
True 15 2 5
False 1 5 4
NEE 2 3 13
Results
Combining HyDE with a simple distance threshold leads to a significant improvement. The
model no longer biases assessing claims as True, nor toward their not being enough evidence. It
also correctly assesses when there isn't enough evidence more often.
Conclusion
Equipping LLMs with a context based on a corpus of documents is a powerful technique for
bringing the general reasoning and natural language interactions of LLMs to your own data.
However, it's important to know that naive query and retrieval may not produce the best
possible results! Ultimately understanding the data will help get the most out of the retrieval
based question-answering approach.
Cookbook About API Docs Contribute
We are addressing the problem of enhancing the precision of transcriptions, particularly when it
comes to company names and product references. Our solution involves a dual strategy that
utilizes both the Whisper prompt parameter and GPT-4's post-processing capabilities.
We input a list of correct spellings directly into Whisper's prompt parameter to guide the
initial transcription.
We utilized GPT-4 to fix misspellings post transcription, again using the same list of correct
spellings in the prompt.
Setup
Import the OpenAI Python library (if you don't have it, you'll need to install it with pip
install openai )
# imports
from openai import OpenAI # for making OpenAI API calls
import urllib # for downloading example audio files
import os # for accessing environment variables
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
Our reference point is a monologue, which was generated by ChatGPT from prompts given by
the author. The author then voiced this content. So, the author both guided the ChatGPT's
output with prompts and brought it to life by speaking it.
Our fictitious company, ZyntriQix, offers a range of tech products. These include Digique Plus,
CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, and DigiFractal Matrix. We also
spearhead several initiatives such as PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., and F.L.I.N.T.
Whisper transcribed our company name, product names, and miscapitalized our acronyms
incorrectly. Let's pass the correct names as a list in the prompt.
"Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, V
When passing the list of product names, some of the product names are transcribed correctly
while others are still misspelled.
"Have you heard of ZentricX? This tech giant boasts products like DigiCube Plus, Synapse 5, Vor
Leveraging GPT-4 proves especially useful when the speech content is unknown beforehand
and we have a list of product names readily available.
The post-processing technique using GPT-4 is notably more scalable than depending solely on
Whisper's prompt parameter, which has a token limit of 244. GPT-4 allows us to process larger
lists of correct spellings, making it a more robust method for handling extensive product lists.
However, this post-processing technique isn't without limitations. It's constrained by the context
window of the chosen model, which may pose challenges when dealing with vast numbers of
unique terms. For instance, companies with thousands of SKUs may find that the context
window of GPT-4 is insufficient to handle their requirements, and they might need to explore
alternative solutions.
Interestingly, the GPT-4 post-processing technique seems more reliable than using Whisper
alone. This method, which leverages a product list, enhances the reliability of our results.
However, this increased reliability comes at a price, as using this approach can increase costs
and can result in higher latency.
Now, let's input the original product list into GPT-4 and evaluate its performance. By doing so,
we aim to assess the AI model's ability to correctly spell the proprietary product names, even
with no prior knowledge of the exact terms to appear in the transcription. In our experiment,
GPT-4 was successful in correctly spelling our product names, confirming its potential as a
reliable tool for ensuring transcription accuracy.
system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any s
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, Vo
In this case, we supplied a comprehensive product list that included all the previously used
spellings, along with additional new names. This scenario simulates a real-life situation where
we have a substantial SKU list and uncertain about the exact terms to appear in the
transcription. Feeding this extensive list of product names into the system resulted in a correctly
transcribed output.
system_prompt = "You are a helpful assistant for the company ZyntriQix. Your task is to correct any s
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, Vo
We are employing GPT-4 as a spell checker, using the same list of correct spellings that was
previously used in the prompt.
system_prompt = "You are a helpful assistant for the company ZyntriQix. Your first task is to list th
new_text = transcribe_with_spellcheck(system_prompt, audio_filepath=ZyntriQix_filepath)
print(new_text)
The misspelled words are: ZentricX, Digi-Q+, Synapse 5, VortiCore V8, Echo Nix Array, Orbital L
Have you heard of ZyntriQix? This tech giant boasts products like Digique Plus, CynapseFive, Vo
Redis as a Context Store with OpenAI Chat
Michael Yuan
Open in Github
May 10, 2023
This notebook demonstrates how to use Redis as high-speed context memory with ChatGPT.
Prerequisites
Redis instance with the Redis Search and Redis JSON modules
Redis-py
Cookbook client lib About API Docs Contribute
Installation
OPENAI_API_KEY=your_key
OpenAI Setup
Key load + helper function for chat completion
import openai
import os
from dotenv import load_dotenv
load_dotenv()
openai.api_key = os.getenv("OPENAI_API_KEY")
Gpt-3.5-turbo was trained on data up to Sep 2021. Let's ask it a question about something that
is beyond that date. In this case, the FTX/Sam Bankman-Fried scandal.
Incomplete Information
prompt ="Is Sam Bankman-Fried's company, FTX, considered a well-managed company? If you don't know f
response = get_completion(prompt)
print(response)
Additional Context
Another way to combat incomplete information is to give the system more information such
that it can make intelligent decisions vs guessing. We'll use Redis as the source for that
additional context. We'll pull in business news articles from after the GPT knowledge cut-off
date such that the system will have a better understanding of how FTX was actually managed.
! docker compose up -d
REDIS_URL = 'redis://localhost:6379'
client = from_url(REDIS_URL)
client.ping()
True
Create Index
FT.CREATE
schema = [ VectorField('$.vector',
"FLAT",
{ "TYPE": 'FLOAT32',
"DIM": 1536,
"DISTANCE_METRIC": "COSINE"
}, as_name='vector' ),
TextField('$.content', as_name='content')
]
idx_def = IndexDefinition(index_type=IndexType.JSON, prefix=['doc:'])
try:
client.ft('idx').dropindex()
except:
pass
client.ft('idx').create_index(schema, definition=idx_def)
b'OK'
Load Data Files into Redis as JSON Objects with Text and
Vector Fields
Redis JSON
import os
import openai
directory = './assets/'
model='text-embedding-3-small'
i = 1
for file in os.listdir(directory):
with open(os.path.join(directory, file)) as f:
content = f.read()
vector = openai.Embedding.create(input = [content], model = model)['data'][0]['embedding']
client.json().set(f'doc:{i}', '$', {'content': content, 'vector': vector})
i += 1
KNN Search
In a statement on Twitter, the company said that Mr. Bankman-Fried had resigned, with John J. R
The speed of FTX’s downfall has left crypto insiders stunned. Just days ago, Mr. Bankman-Fried
“Here we are, with one of the richest people in the world, his net worth dropping to zero, his
Now, the bankruptcy has set up a rush among investors and customers to salvage funds from what
FTX’s collapse has destabilized the crypto industry, which was already reeling from a crash in
Mr. Bankman-Fried was backed by some of the highest-profile venture capital investors in Silico
The company’s demise has also set off a reckoning over risky practices that have become pervasi
“I’m really sorry, again, that we ended up here,” Mr. Bankman-Fried said on Twitter on Friday.
The bankruptcy filing marks the start of what will probably be months or even years of legal fa
The bankruptcy filing included FTX, its U.S. arm and Alameda. According to a bare-bones legal f
The bankruptcy is a stunning fall from grace for the 30-year-old Mr. Bankman-Fried, who cultiva
Now that we have relevant context, add that to the prompt to OpenAI and get a very different
response.
prompt = f"""
Using the information delimited by triple backticks, answer this question: Is Sam Bankman-Fried's com
Context: ```{context}```
"""
response = get_completion(prompt)
print(response)
No, Sam Bankman-Fried's company FTX is not considered a well-managed company as it has filed fo
Cookbook About API Docs Contribute
This notebook is an example on how you can use SingleStoreDB vector storage and functions to
build an interactive Q&A application with ChatGPT. If you start a Trial in SingleStoreDB, you can
find the same notebook in our sample notebooks with native connection.
First let's talk directly to ChatGPT and try and get back a
response
import openai
EMBEDDING_MODEL = "text-embedding-3-small"
GPT_MODEL = "gpt-3.5-turbo"
response = openai.ChatCompletion.create(
model=GPT_MODEL,
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Who won the gold medal for curling in Olymics 2022?"},
]
)
print(response['choices'][0]['message']['content'])
I'm sorry, I cannot provide information about events that have not occurred yet. The Winter Oly
import pandas as pd
import os
import wget
import ast
if not os.path.exists(file_path):
wget.download(embeddings_path, file_path)
print("File downloaded successfully.")
else:
print("File already exists in the local file system.")
File downloaded successfully.
df = pd.read_csv(
"winter_olympics_2022.csv"
)
df
df.info(show_counts=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6059 entries, 0 to 6058
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 text 6059 non-null object
1 embedding 6059 non-null object
dtypes: object(2)
memory usage: 94.8+ KB
2. Set up SingleStore DB
import singlestoredb as s2
conn = s2.connect("<user>:<Password>@<host>:3306/")
cur = conn.cursor()
# Create database
stmt = """
CREATE DATABASE IF NOT EXISTS winter_wikipedia2;
"""
cur.execute(stmt)
1
#create table
stmt = """
CREATE TABLE IF NOT EXISTS winter_wikipedia2.winter_olympics_2022 (
id INT PRIMARY KEY,
text TEXT CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,
embedding BLOB
);"""
cur.execute(stmt)
%%time
def strings_ranked_by_relatedness(
query: str,
df: pd.DataFrame,
relatedness_fn=lambda x, y: 1 - spatial.distance.cosine(x, y),
top_n: int = 100
) -> tuple:
"""Returns a list of strings and relatednesses, sorted from most related to least."""
strings = []
relatednesses = []
import tiktoken
def query_message(
query: str,
df: pd.DataFrame,
model: str,
token_budget: int
) -> str:
"""Return a message for GPT, with relevant source texts pulled from SingleStoreDB."""
strings, relatednesses = strings_ranked_by_relatedness(query, df, "winter_olympics_2022")
introduction = 'Use the below articles on the 2022 Winter Olympics to answer the subsequent quest
question = f"\n\nQuestion: {query}"
message = introduction
for string in strings:
next_article = f'\n\nWikipedia article section:\n"""\n{string}\n"""'
if (
num_tokens(message + next_article + question, model=model)
> token_budget
):
break
else:
message += next_article
return message + question
def ask(
query: str,
df: pd.DataFrame = df,
model: str = GPT_MODEL,
token_budget: int = 4096 - 500,
print_message: bool = False,
) -> str:
"""Answers a query using GPT and a table of relevant texts and embeddings in SingleStoreDB."""
message = query_message(query, df, model=model, token_budget=token_budget)
if print_message:
print(message)
messages = [
{"role": "system", "content": "You answer questions about the 2022 Winter Olympics."},
{"role": "user", "content": message},
]
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=0
)
response_message = response["choices"][0]["message"]["content"]
return response_message
answer = ask('Who won the gold medal for curling in Olymics 2022?')
pprint(answer)
("There were three curling events at the 2022 Winter Olympics: men's, women's, "
'and mixed doubles. The gold medalists for each event are:\n'
'\n'
"- Men's: Sweden (Niklas Edin, Oskar Eriksson, Rasmus Wranå, Christoffer "
'Sundgren, Daniel Magnusson)\n'
"- Women's: Great Britain (Eve Muirhead, Vicky Wright, Jennifer Dodds, Hailey "
'Duff, Mili Smith)\n'
'- Mixed doubles: Italy (Stefania Constantini, Amos Mosaner)')
Cookbook About API Docs Contribute
OpenAI's embedding models cannot embed text that exceeds a maximum length. The
maximum length varies by model, and is measured by tokens, not string length. If you are
unfamiliar with tokenization, check out How to count tokens with tiktoken.
This notebook shows how to handle texts that are longer than a model's maximum context
length. We'll demonstrate using embeddings from text-embedding-3-small , but the same
ideas can be applied to other models and tasks. To learn more about embeddings, check out
the OpenAI Embeddings Guide.
First, we select the model and define a function to get embeddings from the API.
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
EMBEDDING_MODEL = 'text-embedding-3-small'
EMBEDDING_CTX_LENGTH = 8191
EMBEDDING_ENCODING = 'cl100k_base'
# let's make sure to not retry on an invalid request, because that is what we want to demonstrate
@retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6), retry=retry_if_not_ex
def get_embedding(text_or_tokens, model=EMBEDDING_MODEL):
return client.embeddings.create(input=text_or_tokens, model=model).data[0].embedding
The text-embedding-3-small model has a context length of 8191 tokens with the cl100k_base
encoding, and we can see that going over that limit causes an error.
Error code: 400 - {'error': {'message': "This model's maximum context length is 8192 tokens, ho
Clearly we want to avoid these errors, particularly when handling programmatically with a large
number of embeddings. Yet, we still might be faced with texts that are longer than the
maximum context length. Below we describe and provide recipes for the main approaches to
handling these longer texts: (1) simply truncating the text to the maximum allowed length, and
(2) chunking the text and embedding each chunk individually.
The simplest solution is to truncate the input text to the maximum allowed length. Because the
context length is measured in tokens, we have to first tokenize the text before truncating it. The
API accepts inputs both in the form of text or tokens, so as long as you are careful that you are
using the appropriate encoding, there is no need to convert the tokens back into string form.
Below is an example of such a truncation function.
import tiktoken
truncated = truncate_text_tokens(long_text)
len(get_embedding(truncated))
1536
Though truncation works, discarding potentially relevant text is a clear drawback. Another
approach is to divide the input text into chunks and then embed each chunk individually. Then,
we can either use the chunk embeddings separately, or combine them in some way, such as
averaging (weighted by the size of each chunk).
We will take a function from Python's own cookbook that breaks up a sequence into chunks.
Now we define a function that encodes a string into tokens and then breaks it up into chunks.
Finally, we can write a function that safely handles embedding requests, even when the input
text is longer than the maximum context length, by chunking the input tokens and embedding
each chunk individually. The average flag can be set to True to return the weighted average
of the chunk embeddings, or False to simply return the unmodified list of chunk embeddings.
import numpy as np
if average:
chunk_embeddings = np.average(chunk_embeddings, axis=0, weights=chunk_lens)
chunk_embeddings = chunk_embeddings / np.linalg.norm(chunk_embeddings) # normalizes length t
chunk_embeddings = chunk_embeddings.tolist()
return chunk_embeddings
Setting average=True gives us a single 1536-dimensional embedding vector for our long text.
Setting average=False gives us 2 embedding vectors, one for each of the chunks.
In some cases, it may make sense to split chunks on paragraph boundaries or sentence
boundaries to help preserve the meaning of the text.
Cookbook About API Docs Contribute
This notebook guides you step by step on using PolarDB-PG as a vector database for OpenAI
embeddings.
4. Using PolarDB-PG to perform the nearest neighbour search in the created collection.
What is PolarDB-PG
PolarDB-PG is a high-performance vector database that adopts a read-write separation
architecture. It is a cloud-native database managed by Alibaba Cloud, 100% compatible with
PostgreSQL, and highly compatible with Oracle syntax. It supports processing massive vector
data storage and queries, and greatly improves the efficiency of vector calculations through
optimization of underlying execution algorithms, providing users with fast, elastic, high-
performance, massive storage, and secure and reliable vector database services. Additionally,
PolarDB-PG also supports multi-dimensional and multi-modal spatiotemporal information
engines and geographic information engines.At the same time, PolarDB-PG is equipped with
complete OLAP functionality and service level agreements, which has been recognized and used
by many users;
Deployment options
Using PolarDB-PG Cloud Vector Database. Click here to fast deploy it.
Prerequisites
2. The 'psycopg2' library to interact with the vector database. Any other postgresql client
library is ok.
We might validate if the server was launched successfully by running a simple curl command:
Install requirements
This notebook obviously requires the openai and psycopg2 packages, but there are also some
other additional libraries we will use. The following command installs them all:
Prepare your OpenAI API key The OpenAI API key is used for vectorization of the documents
and queries.
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY.
If you have any doubts about setting the API key through environment variables, please refer to
Best Practices for API Key Safety.
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
OPENAI_API_KEY is ready
Connect to PolarDB
First add it to your environment variables. or you can just change the "psycopg2.connect"
parameters below
Connecting to a running instance of PolarDB server is easy with the official Python library:
import os
import psycopg2
# Note. alternatively you can set a temporary env variable like this:
# os.environ["PGHOST"] = "your_host"
# os.environ["PGPORT"] "5432"),
# os.environ["PGDATABASE"] "postgres"),
# os.environ["PGUSER"] "user"),
# os.environ["PGPASSWORD"] "password"),
connection = psycopg2.connect(
host=os.environ.get("PGHOST", "localhost"),
port=os.environ.get("PGPORT", "5432"),
database=os.environ.get("PGDATABASE", "postgres"),
user=os.environ.get("PGUSER", "user"),
password=os.environ.get("PGPASSWORD", "password")
)
import wget
embeddings_url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
'vector_database_wikipedia_articles_embedded.zip'
import zipfile
import os
import re
import tempfile
current_directory = os.getcwd()
zip_file_path = os.path.join(current_directory, "vector_database_wikipedia_articles_embedded.zip")
output_directory = os.path.join(current_directory, "../../data")
if os.path.exists(file_path):
print(f"The file {file_name} exists in the data directory.")
else:
print(f"The file {file_name} does not exist in the data directory.")
Index data
PolarDB stores data in relation where each object is described by at least one vector. Our
relation will be called articles and each object will be described by both title and content
vectors.
We will start with creating a relation and create a vector index on both title and content, and
then we will fill it with our precomputed embeddings.
create_table_sql = '''
CREATE TABLE IF NOT EXISTS public.articles (
id INTEGER NOT NULL,
url TEXT,
title TEXT,
content TEXT,
title_vector vector(1536),
content_vector vector(1536),
vector_id INTEGER
);
Load data
In this section we are going to load the data prepared previous to this session, so you don't
have to recompute the embeddings of Wikipedia articles with your own credits.
import io
# Check the collection size to make sure all the points have been stored
count_sql = """select count(*) from public.articles;"""
cursor.execute(count_sql)
result = cursor.fetchone()
print(f"Count:{result[0]}")
Count:25000
Search data
Once the data is put into Qdrant we will start querying the collection for the closest vectors. We
may provide an additional parameter vector_name to switch from title to content based search.
Since the precomputed embeddings were created with text-embedding-3-small OpenAI model
we also have to use it during search.
return results
import openai
This notebook shows how we prepared a dataset of Wikipedia articles for search, used in
Question_answering_using_embeddings.ipynb.
Cookbook
Procedure: About API Docs Contribute
1. Collect: We download a few hundred Wikipedia articles about the 2022 Olympics
4. Store: Embeddings are saved in a CSV file (for large datasets, use a vector database)
0. Prerequisites
Import libraries
# imports
import mwclient # for downloading example Wikipedia articles
import mwparserfromhell # for splitting Wikipedia articles into sections
import openai # for generating embeddings
import os # for environment variables
import pandas as pd # for DataFrames to store article sections and embeddings
import re # for cutting <ref> links out of Wikipedia articles
import tiktoken # for counting tokens
Install any missing libraries with pip install in your terminal. E.g.,
pip install openai
(You can also do this in a notebook cell with !pip install openai .)
Note that the OpenAI library will try to read your API key from the OPENAI_API_KEY
environment variable. If you haven't already, set this environment variable by following these
instructions.
1. Collect documents
In this example, we'll download a few hundred Wikipedia articles related to the 2022 Winter
Olympics.
def titles_from_category(
category: mwclient.listing.Category, max_depth: int
) -> set[str]:
"""Return a set of page titles in a given Wiki category and its subcategories."""
titles = set()
for cm in category.members():
if type(cm) == mwclient.page.Page:
# ^type() used instead of isinstance() to catch match w/ no inheritance
titles.add(cm.name)
elif isinstance(cm, mwclient.listing.Category) and max_depth > 0:
deeper_titles = titles_from_category(cm, max_depth=max_depth - 1)
titles.update(deeper_titles)
return titles
site = mwclient.Site(WIKI_SITE)
category_page = site.pages[CATEGORY_TITLE]
titles = titles_from_category(category_page, max_depth=1)
# ^note: max_depth=1 means we go one level deep in the category tree
print(f"Found {len(titles)} article titles in {CATEGORY_TITLE}.")
Found 731 article titles in Category:2022 Winter Olympics.
2. Chunk documents
Now that we have our reference documents, we need to prepare them for search.
Because GPT can only read a limited amount of text at once, we'll split each document into
chunks short enough to be read.
Clean up the text by removing reference tags (e.g., ), whitespace, and super short sections
Prepend titles and subtitles to each section's text, to help GPT understand the context
If a section is long (say, > 1,600 tokens), we'll recursively split it into smaller sections, trying
to split along semantic boundaries like paragraphs
SECTIONS_TO_IGNORE = [
"See also",
"References",
"External links",
"Further reading",
"Footnotes",
"Bibliography",
"Sources",
"Citations",
"Literature",
"Footnotes",
"Notes and references",
"Photo gallery",
"Works cited",
"Photos",
"Gallery",
"Notes",
"References and sources",
"References and notes",
]
def all_subsections_from_section(
section: mwparserfromhell.wikicode.Wikicode,
parent_titles: list[str],
sections_to_ignore: set[str],
) -> list[tuple[list[str], str]]:
"""
From a Wikipedia section, return a flattened list of all nested subsections.
Each subsection is a tuple, where:
- the first element is a list of parent subtitles, starting with the page title
- the second element is the text of the subsection (but not any children)
"""
headings = [str(h) for h in section.filter_headings()]
title = headings[0]
if title.strip("=" + " ") in sections_to_ignore:
# ^wiki headings are wrapped like "== Heading =="
return []
titles = parent_titles + [title]
full_text = str(section)
section_text = full_text.split(title)[1]
if len(headings) == 1:
return [(titles, section_text)]
else:
first_subtitle = headings[1]
section_text = section_text.split(first_subtitle)[0]
results = [(titles, section_text)]
for subsection in section.get_sections(levels=[len(titles) + 1]):
results.extend(all_subsections_from_section(subsection, titles, sections_to_ignore))
return results
def all_subsections_from_title(
title: str,
sections_to_ignore: set[str] = SECTIONS_TO_IGNORE,
site_name: str = WIKI_SITE,
) -> list[tuple[list[str], str]]:
"""From a Wikipedia page title, return a flattened list of all nested subsections.
Each subsection is a tuple, where:
- the first element is a list of parent subtitles, starting with the page title
- the second element is the text of the subsection (but not any children)
"""
site = mwclient.Site(site_name)
page = site.pages[title]
text = page.text()
parsed_text = mwparserfromhell.parse(text)
headings = [str(h) for h in parsed_text.filter_headings()]
if headings:
summary_text = str(parsed_text).split(headings[0])[0]
else:
summary_text = str(parsed_text)
results = [([title], summary_text)]
for subsection in parsed_text.get_sections(levels=[2]):
results.extend(all_subsections_from_section(subsection, [title], sections_to_ignore))
return results
# clean text
def clean_section(section: tuple[list[str], str]) -> tuple[list[str], str]:
"""
Return a cleaned up section with:
- <ref>xyz</ref> patterns removed
- leading/trailing whitespace removed
"""
titles, text = section
text = re.sub(r"<ref.*?</ref>", "", text)
text = text.strip()
return (titles, text)
original_num_sections = len(wikipedia_sections)
wikipedia_sections = [ws for ws in wikipedia_sections if keep_section(ws)]
print(f"Filtered out {original_num_sections-len(wikipedia_sections)} sections, leaving {len(wikipedia
Longer sections may be better for questions that require more context
Longer sections may be worse for retrieval, as they may have more topics muddled
together
Shorter sections are better for reducing costs (which are proportional to the number of
tokens)
Shorter sections allow more sections to be retrieved, which may help with recall
Overlapping sections may help prevent answers from being cut by section boundaries
Here, we'll use a simple approach and limit sections to 1,600 tokens each, recursively halving
any sections that are too long. To avoid cutting in the middle of useful sentences, we'll split
along paragraph boundaries when possible.
def truncated_string(
string: str,
model: str,
max_tokens: int,
print_warning: bool = True,
) -> str:
"""Truncate a string to a maximum number of tokens."""
encoding = tiktoken.encoding_for_model(model)
encoded_string = encoding.encode(string)
truncated_string = encoding.decode(encoded_string[:max_tokens])
if print_warning and len(encoded_string) > max_tokens:
print(f"Warning: Truncated string from {len(encoded_string)} tokens to {max_tokens} tokens.")
return truncated_string
def split_strings_from_subsection(
subsection: tuple[list[str], str],
max_tokens: int = 1000,
model: str = GPT_MODEL,
max_recursion: int = 5,
) -> list[str]:
"""
Split a subsection into a list of subsections, each with no more than max_tokens.
Each subsection is a tuple of parent titles [H1, H2, ...] and text (str).
"""
titles, text = subsection
string = "\n\n".join(titles + [text])
num_tokens_in_string = num_tokens(string)
# if length is fine, return string
if num_tokens_in_string <= max_tokens:
return [string]
# if recursion hasn't found a split after X iterations, just truncate
elif max_recursion == 0:
return [truncated_string(string, model=model, max_tokens=max_tokens)]
# otherwise, split in half and recurse
else:
titles, text = subsection
for delimiter in ["\n\n", "\n", ". "]:
left, right = halved_by_delimiter(text, delimiter=delimiter)
if left == "" or right == "":
# if either half is empty, retry with a more fine-grained delimiter
continue
else:
# recurse on each half
results = []
for half in [left, right]:
half_subsection = (titles, half)
half_strings = split_strings_from_subsection(
half_subsection,
max_tokens=max_tokens,
model=model,
max_recursion=max_recursion - 1,
)
results.extend(half_strings)
return results
# otherwise no split was found, so just truncate (should be very rare)
return [truncated_string(string, model=model, max_tokens=max_tokens)]
==History==
On 27 May 2010, [[President of Ukraine]] [[Viktor Yanukovych]] stated during a visit to [[Lviv]
On 5 November 2013, it was confirmed that Lviv was bidding to host the [[2022 Winter Olympics]]
On 30 June 2014, the International Olympic Committee announced "Lviv will turn its attention to
Ukraine's Deputy Prime Minister Oleksandr Vilkul said that the Winter Games "will be an impetus
Now that we've split our library into shorter self-contained strings, we can compute
embeddings for each.
EMBEDDING_MODEL = "text-embedding-3-small"
BATCH_SIZE = 1000 # you can submit up to 2048 embedding inputs per request
embeddings = []
for batch_start in range(0, len(wikipedia_strings), BATCH_SIZE):
batch_end = batch_start + BATCH_SIZE
batch = wikipedia_strings[batch_start:batch_end]
print(f"Batch {batch_start} to {batch_end-1}")
response = client.embeddings.create(model=EMBEDDING_MODEL, input=batch)
for i, be in enumerate(response.data):
assert i == be.index # double check embeddings are in same order as input
batch_embeddings = [e.embedding for e in response.data]
embeddings.extend(batch_embeddings)
Batch 0 to 999
Batch 1000 to 1999
Batch 2000 to 2999
Batch 3000 to 3999
Batch 4000 to 4999
Batch 5000 to 5999
Batch 6000 to 6999
(For larger datasets, use a vector database, which will be more performant.)
SAVE_PATH = "data/winter_olympics_2022.csv"
df.to_csv(SAVE_PATH, index=False)
Cookbook About API Docs Contribute
This example shows how to use Azure OpenAI service models with your own data. The feature is
currently in preview.
Azure OpenAI on your data enables you to run supported chat models such as GPT-3.5-Turbo
and GPT-4 on your data without needing to train or fine-tune models. Running models on your
data enables you to chat on top of, and analyze your data with greater accuracy and speed. One
of the key benefits of Azure OpenAI on your data is its ability to tailor the content of
conversational AI. Because the model has access to, and can reference specific sources to
support its responses, answers are not only based on its pretrained knowledge but also on the
latest information available in the designated data source. This grounding data also helps the
model avoid generating responses based on outdated or incorrect information.
Azure OpenAI on your own data with Azure AI Search (f.k.a. Azure Cognitive Search) provides a
customizable, pre-built solution for knowledge retrieval, from which a conversational AI
application can be built. To see alternative methods for knowledge retrieval and semantic
search, check out the cookbook examples for vector databases.
How it works
Azure OpenAI on your own data connects the model with your data, giving it the ability to
retrieve and utilize data in a way that enhances the model's output. Together with Azure AI
Search, data is retrieved from designated data sources based on the user input and provided
conversation history. The data is then augmented and resubmitted as a prompt to the model,
giving the model contextual information it can use to generate a response.
See the Data, privacy, and security for Azure OpenAI Service for more information.
Prerequisites
To properly access the Azure OpenAI Service, we need to create the proper resources at the
Azure Portal (you can check a detailed guide on how to do this in the Microsoft Docs)
To use your own data with Azure OpenAI models, you will need:
1. Azure OpenAI access and a resource with a chat model deployed (for example, GPT-3 or
GPT-4)
For a full walk-through on how to upload your documents to blob storage and create an index
using the Azure AI Studio, see this Quickstart.
Setup
In this example, we'll use dotenv to load our environment variables. To connect with Azure
OpenAI and the Search index, the following variables should be added to a .env file in
KEY=VALUE format:
AZURE_OPENAI_ENDPOINT - the Azure OpenAI endpoint. This can be found under "Keys and
AZURE_OPENAI_API_KEY - the Azure OpenAI API key. This can be found under "Keys and
Endpoints" for your Azure OpenAI resource in the Azure Portal. Omit if using Azure Active
Directory authentication (see below Authentication using Microsoft Active Directory )
SEARCH_ENDPOINT - the AI Search endpoint. This URL be found on the "Overview" of your
SEARCH_KEY - the AI Search API key. Found under "Keys" for your Search resource in the
Azure Portal.
SEARCH_INDEX_NAME - the name of the index you created with your own data.
import os
import openai
import dotenv
dotenv.load_dotenv()
Authentication
The Azure OpenAI service supports multiple authentication mechanisms that include API keys
and Azure Active Directory token credentials.
use_azure_active_directory = False # Set this flag to True if you are using Azure Active Directory
To set up the OpenAI SDK to use an Azure API Key, we need to set api_key to a key associated
with your endpoint (you can find this key in "Keys and Endpoints" under "Resource Management"
in the Azure Portal). You'll also find the endpoint for your resource here.
if not use_azure_active_directory:
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_API_KEY"]
# set the deployment name for the model we want to use
deployment = "<deployment-id-of-the-model-to-use>"
client = openai.AzureOpenAI(
base_url=f"{endpoint}/openai/deployments/{deployment}/extensions",
api_key=api_key,
api_version="2023-09-01-preview"
)
and help us build a token credential provider through the get_bearer_token_provider helper
function. It's recommended to use get_bearer_token_provider over providing a static token to
AzureOpenAI because this API will automatically cache and refresh tokens for you.
For more information on how to set up Azure Active Directory authentication with Azure
OpenAI, see the documentation.
if use_azure_active_directory:
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_API_KEY"]
# set the deployment name for the model we want to use
deployment = "<deployment-id-of-the-model-to-use>"
client = openai.AzureOpenAI(
base_url=f"{endpoint}/openai/deployments/{deployment}/extensions",
azure_ad_token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiv
api_version="2023-09-01-preview"
)
“Note: the AzureOpenAI infers the following arguments from their corresponding
environment variables if they are not provided:”
Code
Now we can use Azure on your own data with Chat Completions. Providing our search
endpoint, key, and index name in dataSources , any questions posed to the model will now be
grounded in our own data. An additional property, context , will be provided in the response to
show the data the model referenced to answer the question.
completion = client.chat.completions.create(
messages=[{"role": "user", "content": "What are the differences between Azure Machine Learning an
model=deployment,
extra_body={
"dataSources": [
{
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": os.environ["SEARCH_ENDPOINT"],
"key": os.environ["SEARCH_KEY"],
"indexName": os.environ["SEARCH_INDEX_NAME"],
}
}
]
}
)
print(f"{completion.choices[0].message.role}: {completion.choices[0].message.content}")
If you would prefer to stream the response from the model, you can pass the stream=True
keyword argument:
response = client.chat.completions.create(
messages=[{"role": "user", "content": "What are the differences between Azure Machine Learning an
model=deployment,
extra_body={
"dataSources": [
{
"type": "AzureCognitiveSearch",
"parameters": {
"endpoint": os.environ["SEARCH_ENDPOINT"],
"key": os.environ["SEARCH_KEY"],
"indexName": os.environ["SEARCH_INDEX_NAME"],
}
}
]
},
stream=True,
)
if delta.role:
print("\n"+ delta.role + ": ", end="", flush=True)
if delta.content:
print(delta.content, end="", flush=True)
if delta.model_extra.get("context"):
print(f"Context: {delta.model_extra['context']}", end="", flush=True)
Cookbook About API Docs Contribute
With permission of the author, we will demonstrate how to translate the book Euclidean Plane
Geometry, written by Milan Mitrović from Slovenian into English, without modifying any of the
LaTeX commands.
To achieve this, we will first split the book into chunks, each roughly a page long, then translate
each chunk into English, and finally stitch them back together.
1485565
chunks = text.split('\n\n')
ntokens = []
for chunk in chunks:
ntokens.append(len(tokenizer.encode(chunk)))
max(ntokens)
Token indices sequence length is longer than the specified maximum sequence length for this mod
1473
It turns out that a double newline is a good separator in this case, in order not to break the flow
of the text. Also no individual chunk is larger than 1500 tokens. The model we will use is text-
davinci-002, which has a limit of 4096 tokens, so we don't need to worry about breaking the
chunks down further.
We will group the shorter chunks into chunks of around 1000 tokens, to increase the coherence
of the text, and decrease the frequency of breaks within the text.
return batches
chunks = group_chunks(chunks, ntokens)
len(chunks)
869
Notice that adding a sample untranslated and translated first command, where only the content
of the chapter name needs to be translated, helps to get more consistent results.
1. A high level instruction to translate only the text, but not commands into the desired
language
2. A sample untranslated command, where only the content of the chapter name needs to be
translated
4. The translated sample command from 2, which shows the model the beginning of the
translation process
"""
{sample_translation[0]}
{chunk}"""
{sample_translation[1]}
'''
response = client.chat.completions.create(
messages=[{"role": "user", "content":prompt}],
model=model,
temperature=0,
top_p=1,
max_tokens=1500,
)
result = response.choices[0].message.content.strip()
result = result.replace('"""', '') # remove the double quotes, as we used them to surround the te
return result
print(translate_chunk(chunks[800], model='gpt-3.5-turbo', dest_language='English'))
\item \res{Given are the points $A$ and $B$ on the same side of the line
$p$.
Draw the line $XY$, which lies on the line $p$ and is consistent
with the given line $l$, so that the sum
$|AX|+|XY|+|YB|$ is minimal.}
\item \res{Let $ABC$ be an isosceles right triangle with a right angle at the vertex $A$. What
$\mathcal{G}_{\overrightarrow{AB}}\circ \mathcal{G}_{\overrightarrow{CA}}$ represent?}
Let $p$ and $q$ be the simetrali of the sides $CA$ and $AB$ of the triangle
$ABC$. By \ref{izoZrcDrsKompSrOsn} is:
$$\mathcal{G}_{\overrightarrow{AB}}\circ
\mathcal{G}_{\overrightarrow{CA}}=
\mathcal{S}_q\circ\mathcal{S}_A\circ\mathcal{S}_A\circ\mathcal{S}_p=
\mathcal{S} q\circ\mathcal{S} p $$ Because $ABC$ is an isosceles
We can see here that this one chunk in particular translates only the text, but leaves LaTeX
commands intact.
Let's now translate all the chunks in the book - this will take 2-3 hours, as we're processing
requests sequentially.
dest_language = "English"
translated_chunks = []
for i, chunk in enumerate(chunks):
print(str(i+1) + " / " + str(len(chunks)))
# translate each chunk
translated_chunks.append(translate_chunk(chunk, model='gpt-3.5-turbo', dest_language=dest_languag
0 / 869
1 / 869
2 / 869
3 / 869
4 / 869
5 / 869
6 / 869
7 / 869
8 / 869
9 / 869
10 / 869
11 / 869
12 / 869
13 / 869
14 / 869
15 / 869
16 / 869
17 / 869
18 / 869
19 / 869
20 / 869
21 / 869
22 / 869
23 / 869
24 / 869
25 / 869
26 / 869
27 / 869
28 / 869
Cookbook About API Docs Contribute
This notebook takes you through a simple flow to download some data, embed it, and then
index and search it using a selection of vector databases. This is a common requirement for
customers who want to store and search our embeddings with their own data in a secure
environment to support production use cases such as chatbots, topic modelling and more.
Vector databases enable enterprises to take many of the embeddings use cases we've shared in
this repo (question and answering, chatbot and recommendation services, for example), and
make use of them in a secure, scalable environment. Many of our customers make embeddings
solve their problems at small scale but performance and security hold them back from going
into production - we see vector databases as a key component in solving that, and in this guide
we'll walk through the basics of embedding text data, storing it in a vector database and using it
for semantic search.
Demo Flow
The demo flow is:
Setup: Import packages and set any required variables
MyScale
Setup: Set up the MyScale Python client. For more details go here
Search Data: Run a few example queries with various goals in mind.
Once you've run through this notebook you should have a basic understanding of how to setup
and use vector databases, and can move on to more complex use cases making use of our
embeddings.
Setup
Import the required libraries and set the embedding model that we'd like to use.
import openai
# I've set this to our new embeddings model, this can be changed to the embedding model of your choic
EMBEDDING_MODEL = "text-embedding-3-small"
# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings
In this section we'll load embedded data that we've prepared previous to this session.
embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
import zipfile
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip","r") as zip_ref:
zip_ref.extractall("../data")
article_df = pd.read_csv('../data/vector_database_wikipedia_articles_embedded.csv')
article_df.head()
article_df.info(show_counts=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 25000 non-null int64
1 url 25000 non-null object
2 title 25000 non-null object
3 text 25000 non-null object
4 title_vector 25000 non-null object
5 content_vector 25000 non-null object
6 vector_id 25000 non-null object
dtypes: int64(1), object(6)
memory usage: 1.3+ MB
MyScale
MyScale is a database built on Clickhouse that combines vector search and SQL analytics to
offer a high-performance, streamlined, and fully managed experience. It's designed to facilitate
joint queries and analyses on both structured and vector data, with comprehensive SQL support
for all data processing.
Deploy and execute vector search with SQL on your cluster within two minutes by using
MyScale Console.
Connect to MyScale
Follow the connections details section to retrieve the cluster host, username, and password
information from the MyScale console, and use it to create a connection to your cluster as
shown below:
# initialize client
client = clickhouse_connect.get_client(host='YOUR_CLUSTER_HOST', port=8443, username='YOUR_USERNAME',
Index data
We will create an SQL table called articles in MyScale to store the embeddings data. The
table will include a vector index with a cosine distance metric and a constraint for the length of
the embeddings. Use the following code to create and insert data into the articles table:
client.command(f"""
CREATE TABLE IF NOT EXISTS default.articles
(
id UInt64,
url String,
title String,
text String,
content_vector Array(Float32),
CONSTRAINT cons_vector_len CHECK length(content_vector) = {embedding_len},
VECTOR INDEX article_content_index content_vector TYPE HNSWFLAT('metric_type=Cosine')
)
ENGINE = MergeTree ORDER BY id
""")
batch_size = 100
total_records = len(article_df)
# check the status of the vector index, make sure vector index is ready with 'Built' status
get_index_status="SELECT status FROM system.vector_indices WHERE name='article_content_index'"
print(f"index build status: {client.command(get_index_status)}")
Search data
Once indexed in MyScale, we can perform vector search to find similar content. First, we will use
the OpenAI API to generate embeddings for our query. Then, we will perform the vector search
using MyScale.
# query the database to find the top K similar content to the given query
top_k = 10
results = client.query(f"""
SELECT id, url, title, distance(content_vector, {embed}) as dist
FROM default.articles
ORDER BY dist
LIMIT {top_k}
""")
# display results
for i, r in enumerate(results.named_results()):
print(i+1, r['title'])
1 Battle of Bannockburn
2 Wars of Scottish Independence
3 1651
4 First War of Scottish Independence
5 Robert I of Scotland
6 841
7 1716
8 1314
9 1263
10 William Wallace
Cookbook About API Docs Contribute
About
The OpenAI Cookbook is an open-source collection of examples and guides for building with
the OpenAI API.
To run these examples, you'll need an OpenAI account and API key. You can create a free
account here.
Most code examples are written in Python, though the concepts can be applied in any
language.
Contributing
This website is automatically generated from our GitHub repository. If there are examples or
guides you'd like to see, feel free to suggest them on the issues page. We are also happy to
accept high quality pull requests, as long as they fit the scope of the cookbook.
Other Resources
Beyond the examples and guides here, you can learn more about OpenAI from the following
resources:
This notebook serves as a tool to preprocess and analyze the chat dataset used for fine-tuning a
chat model. It checks for format errors, provides basic statistics, and estimates token counts for
fine-tuning costs. The method shown here corresponds to the current fine-tuning method for
gpt-3.5-turbo. See legacy fine-tuning for models like babbage-002 and davinci-002.
import json
import tiktoken # for token counting
import numpy as np
from collections import defaultdict
Data loading
data_path = "data/toy_chat_fine_tuning.jsonl"
Num examples: 5
First example:
{'role': 'system', 'content': 'You are a happy assistant that puts a positive spin on everythin
{'role': 'user', 'content': 'I fell off my bike today.'}
{'role': 'assistant', 'content': "It's great that you're getting exercise outdoors!"}
Format validation
We can perform a variety of error checks to validate that each conversation in the dataset
adheres to the format expected by the fine-tuning API. Errors are categorized based on their
nature for easier debugging.
1. Data Type Check: Checks whether each entry in the dataset is a dictionary ( dict ). Error
type: data_type .
2. Presence of Message List: Checks if a messages list is present in each entry. Error type:
missing_messages_list .
3. Message Keys Check: Validates that each message in the messages list contains the keys
role and content . Error type: message_missing_key .
4. Unrecognized Keys in Messages: Logs if a message has keys other than role , content ,
and name . Error type: message_unrecognized_key .
5. Role Validation: Ensures the role is one of "system", "user", or "assistant". Error type:
unrecognized_role .
6. Content Validation: Verifies that content has textual data and is a string. Error type:
missing_content .
7. Assistant Message Presence: Checks that each conversation has at least one message from
the assistant. Error type: example_missing_assistant_message .
The code below performs these checks, and outputs counts for each type of error found are
printed. This is useful for debugging and ensuring the dataset is ready for the next steps.
for ex in dataset:
if not isinstance(ex, dict):
format_errors["data_type"] += 1
continue
if format_errors:
print("Found errors:")
for k, v in format_errors.items():
print(f"{k}: {v}")
else:
print("No errors found")
No errors found
Lets define a few helpful utilities to be used in the rest of the notebook.
encoding = tiktoken.get_encoding("cl100k_base")
# not exact!
# simplified from https://github.com/openai/openai-cookbook/blob/main/examples/How_to_count_tokens_wi
def num_tokens_from_messages(messages, tokens_per_message=3, tokens_per_name=1):
num_tokens = 0
for message in messages:
num_tokens += tokens_per_message
for key, value in message.items():
num_tokens += len(encoding.encode(value))
if key == "name":
num_tokens += tokens_per_name
num_tokens += 3
return num_tokens
def num_assistant_tokens_from_messages(messages):
num_tokens = 0
for message in messages:
if message["role"] == "assistant":
num_tokens += len(encoding.encode(message["content"]))
return num_tokens
With some lightweight analysis we can identify potential issues in the dataset, like missing
messages, and provide statistical insights into message and token counts.
3. Total Tokens Per Example: Calculates and summarizes the distribution of the total number
of tokens in each conversation. Important for understanding fine-tuning costs.
5. Token Limit Warnings: Checks if any examples exceed the maximum token limit (4096
tokens), as such examples will be truncated during fine-tuning, potentially resulting in data
loss.
for ex in dataset:
messages = ex["messages"]
if not any(message["role"] == "system" for message in messages):
n_missing_system += 1
if not any(message["role"] == "user" for message in messages):
n_missing_user += 1
n_messages.append(len(messages))
convo_lens.append(num_tokens_from_messages(messages))
assistant_message_lens.append(num_assistant_tokens_from_messages(messages))
1 examples may be over the 4096 token limit, they will be truncated during fine-tuning
Cost Estimation
In this final section, we estimate the total number of tokens that will be used for fine-tuning,
which allows us to approximate the cost. It is worth noting that the duration of the fine-tuning
jobs will also increase with the token count.
TARGET_EPOCHS = 3
MIN_TARGET_EXAMPLES = 100
MAX_TARGET_EXAMPLES = 25000
MIN_DEFAULT_EPOCHS = 1
MAX_DEFAULT_EPOCHS = 25
n_epochs = TARGET_EPOCHS
n_train_examples = len(dataset)
if n_train_examples * TARGET_EPOCHS < MIN_TARGET_EXAMPLES:
n_epochs = min(MAX_DEFAULT_EPOCHS, MIN_TARGET_EXAMPLES // n_train_examples)
elif n_train_examples * TARGET_EPOCHS > MAX_TARGET_EXAMPLES:
n_epochs = max(MIN_DEFAULT_EPOCHS, MAX_TARGET_EXAMPLES // n_train_examples)
Dataset has ~4306 tokens that will be charged for during training
By default, you'll train for 20 epochs on this dataset
By default, you'll be charged for ~86120 tokens
Documentation puts useful information inside other people’s heads. Follow these tips to write
better documentation.
Split content into sections with titles. Section titles act as signposts, telling readers whether to
focus in or move on.
Prefer titles with informative sentences over abstract nouns. For example, if you use a title like
“Results”, a reader will need to hop into the following text to learn what the results actually are.
In contrast, if you use the title “Streaming reduced time to first token by 50%”, it gives the
reader the information immediately, without the burden of an extra hop.
Include a table of contents. Tables of contents help readers find information faster, akin to how
hash maps have faster lookups than linked lists. Tables of contents also have a second, oft
overlooked benefit: they give readers clues about the doc, which helps them understand if it’s
worth reading.
Keep paragraphs short. Shorter paragraphs are easier to skim. If you have an essential point,
consider putting it in its own one-sentence paragraph to reduce the odds it’s missed. Long
paragraphs can bury information.
Begin paragraphs and sections with short topic sentences that give a standalone preview.
When people skim, they look disproportionately at the first word, first line, and first sentence of
a section. Write these sentences in a way that don’t depend on prior text. For example, consider
the first sentence “Building on top of this, let’s now talk about a faster way.” This sentence will
be meaningless to someone who hasn’t read the prior paragraph. Instead, write it in a way that
can understood standalone: e.g., “Vector databases can speed up embeddings search.”
Put topic words at the beginning of topic sentences. Readers skim most efficiently when they
only need to read a word or two to know what a paragraph is about. Therefore, when writing
topic sentences, prefer putting the topic at the beginning of the sentence rather than the end.
For example, imagine you’re writing a paragraph on vector databases in the middle of a long
article on embeddings search. Instead of writing “Embeddings search can be sped up by vector
databases” prefer “Vector databases speed up embeddings search.” The second sentence is
better for skimming, because it puts the paragraph topic at the beginning of the paragraph.
Put the takeaways up front. Put the most important information at the tops of documents and
sections. Don’t write a Socratic big build up. Don’t introduce your procedure before your results.
Use bullets and tables. Bulleted lists and tables make docs easier to skim. Use them frequently.
Bold important text. Don’t be afraid to bold important text to help readers find it.
Write well
Badly written text is taxing to read. Minimize the tax on readers by writing well.
Keep sentences simple. Split long sentences into two. Cut adverbs. Cut unnecessary words and
phrases. Use the imperative mood, if applicable. Do what writing books tell you.
Write sentences that can be parsed unambiguously. For example, consider the sentence “Title
sections with sentences.” When a reader reads the word “Title”, their brain doesn’t yet know
whether “Title” is going to be a noun or verb or adjective. It takes a bit of brainpower to keep
track as they parse the rest of the sentence, and can cause a hitch if their brain mispredicted the
meaning. Prefer sentences that can be parsed more easily (e.g., “Write section titles as
sentences”) even if longer. Similarly, avoid noun phrases like “Bicycle clearance exercise notice”
which can take extra effort to parse.
Avoid left-branching sentences. Linguistic trees show how words relate to each other in
sentences. Left-branching trees require readers to hold more things in memory than right-
branching sentences, akin to breadth-first search vs depth-first search. An example of a left-
branching sentence is “You need flour, eggs, milk, butter and a dash of salt to make pancakes.”
In this sentence you don’t find out what ‘you need’ connects to until you reach the end of the
sentence. An easier-to-read right-branching version is “To make pancakes, you need flour, eggs,
milk, butter, and a dash of salt.” Watch out for sentences in which the reader must hold onto a
word for a while, and see if you can rephrase them.
Avoid demonstrative pronouns (e.g., “this”), especially across sentences. For example, instead
of saying “Building on our discussion of the previous topic, now let’s discuss function calling” try
“Building on message formatting, now let’s discuss function calling.” The second sentence is
easier to understand because it doesn’t burden the reader with recalling the previous topic.
Look for opportunities to cut demonstrative pronouns altogether: e.g., “Now let’s discuss
function calling.”
Be consistent. Human brains are amazing pattern matchers. Inconsistencies will annoy or
distract readers. If we use Title Case everywhere, use Title Case. If we use terminal commas
everywhere, use terminal commas. If all of the Cookbook notebooks are named with
underscores and sentence case, use underscores and sentence case. Don’t do anything that will
cause a reader to go ‘huh, that’s weird.’ Help them focus on the content, not its inconsistencies.
Don’t tell readers what they think or what to do. Avoid sentences like “Now you probably want
to understand how to call a function” or “Next, you’ll need to learn to call a function.” Both
examples presume a reader’s state of mind, which may annoy them or burn our credibility. Use
phrases that avoid presuming the reader’s state. E.g., “To call a function, …”
Be broadly helpful
People come to documentation with varying levels of knowledge, language proficiency, and
patience. Even if we target experienced developers, we should try to write docs helpful to
everyone.
Write simply. Explain things more simply than you think you need to. Many readers might not
speak English as a first language. Many readers might be really confused about technical
terminology and have little excess brainpower to spend on parsing English sentences. Write
simply. (But don’t oversimplify.)
Avoid abbreviations. Write things out. The cost to experts is low and the benefit to beginners is
high. Instead of IF, write instruction following. Instead of RAG, write retrieval-augmented
generation (or my preferred term: the search-ask procedure).
Offer solutions to potential problems. Even if 95% of our readers know how to install a Python
package or save environment variables, it can still be worth proactively explaining it. Including
explanations is not costly to experts—they can skim right past them. But excluding explanations
is costly to beginners—they might get stuck or even abandon us. Remember that even an
expert JavaScript engineer or C++ engineer might be a beginner at Python. Err on explaining
too much, rather than too little.
Prefer terminology that is specific and accurate. Jargon is bad. Optimize the docs for people
new to the field, instead of ourselves. For example, instead of writing “prompt”, write “input.” Or
instead of writing “context limit” write “max token limit.” The latter terms are more self-evident,
and are probably better than the jargon developed in base model days.
Keep code examples general and exportable. In code demonstrations, try to minimize
dependencies. Don’t make users install extra libraries. Don’t make them have to refer back and
forth between different pages or sections. Try to make examples simple and self-contained.
Prioritize topics by value. Documentation that covers common problems—e.g., how to count
tokens—is magnitudes more valuable than documentation that covers rare problems—e.g., how
to optimize an emoji database. Prioritize accordingly.
Don’t teach bad habits. If API keys should not be stored in code, never share an example that
stores an API key in code.
Introduce topics with a broad opening. For example, if explaining how to program a good
recommender, consider opening by briefly mentioning that recommendations are widespread
across the web, from YouTube videos to Amazon items to Wikipedia. Grounding a narrow topic
with a broad opening can help people feel more secure before jumping into uncertain territory.
And if the text is well-written, those who already know it may still enjoy it.
Note: you will need an OpenAI API key to run this colab.
Use the W&B OpenAI integration to monitor OpenAI API calls and understand how your
projects and teams are leveraging LLMs. In this example, we'll generate templated Weave
Boards: LLM usage monitoring dashboards which you can explore and customize from the UI.
automatically track LLM usage and aggregate useful metrics like cost, latency and
throughput across your projects/teams
dynamically query and derive insights from the logs of all your OpenAI API calls
iterate visually to slice, aggregate, and explore your data; customize panels to focus on
interesting patterns; share progress more easily with your team through an interactive
dashboard
New to Weights & Biases? -> Sign up for an account here <-
Step 0: Setup
Install dependencies, login to W&B so you can save and share your work, and authenticate with
OpenAI.
import weave
import os
WANDB_BASE_URL = "https://api.wandb.ai"
os.environ["WANDB_BASE_URL"] = WANDB_BASE_URL
if os.getenv("OPENAI_API_KEY") is None:
os.environ["OPENAI_API_KEY"] = getpass("Paste your OpenAI key from: https://platform.openai.com/acc
assert os.getenv("OPENAI_API_KEY", "").startswith("sk-"), "This doesn't look like a valid OpenAI API
print("OpenAI API key configured")
Running this cell will print out a link to view the current project in the Weave UI.
from weave.monitoring import openai, init_monitor
m = init_monitor(f"{WB_ENTITY}/{WB_PROJECT}/{STREAM_NAME}")
keep the Board open in a separate tab and refresh to view the latest data
rename the Board for easier reference at any point and "Publish" that version to share a link
with others
find previously saved Boards by navigating to the relevant W&B entity and W&B project
name from weave.wandb.ai
or open a new instance of a Board template to start fresh with all the data accumulated so
far
Next we'll illustrate a few ways you could track OpenAI API calls. There are many more
possibilities depending on your use case, and we can't wait to see what you create from these
starter templates.
Examples
Monitor a ChatCompletion request and print the corresponding response, extracting only the
text of the completion.
Factor out parameters of interest and track them as attributes on the logged record. Here we
track the "system prompt" separately from the "prompt template" and the "equation"
parameter. This time we'll print the full structured response from the ChatCompletion call.
Monitor a stream of messages and log the result as a single record. Note: tokens are not
counted in this format.
from weave.monitoring.openai import message_from_stream
r = openai.ChatCompletion.create(model=OPENAI_MODEL, messages=[
{"role": "system", "content": "You are a robot and only speak in robot, like beep bloop bop."
{"role": "user", "content": "Tell me a 50-word story."},
], stream=True)
for s in message_from_stream(r):
print(s, end='')
Here we compare a few toy options for the system prompt, user question, and intended
audience. Try your own experiments and see if any interesting insights emerge as you explore in
the Board and group by different parameters.
Complex tasks, such as writing unit tests, can benefit from multi-step prompts. In contrast to a
single prompt, a multi-step prompt generates text from GPT and then feeds that output text
back into subsequent prompts. This can help in cases where you want GPT to reason things out
before answering, or brainstorm a plan before executing it.
In this notebook, we use a 3-step prompt to write unit tests in Python using the following steps:
1. Explain: Given a Python function, we ask GPT to explain what the function is doing and
why.
2. Plan: We ask GPT to plan a set of unit tests for the function.
If the plan is too short, we ask GPT to elaborate with more ideas for unit tests.
3. Execute: Finally, we instruct GPT to write unit tests that cover the planned cases.
The code example illustrates a few embellishments on the chained, multi-step prompt:
Conditional branching (e.g., asking for elaboration only if the first plan is too short)
A check that re-runs the function if the output is unsatisfactory (e.g., if the output code
cannot be parsed by Python's ast module)
Streaming output so that you can start reading the output before it's fully generated
(handy for long, multi-step outputs)
color_prefix_by_role = {
"system": "\033[0m", # gray
"user": "\033[0m", # gray
"assistant": "\033[92m", # green
}
# create a markdown-formatted message that asks GPT to explain the function, formatted as a bulle
explain_system_message = {
"role": "system",
"content": "You are a world-class Python developer with an eagle eye for unintended bugs and
}
explain_user_message = {
"role": "user",
"content": f"""Please explain the following Python function. Review what each element of the
```python
{function_to_test}
```""",
}
explain_messages = [explain_system_message, explain_user_message]
if print_text:
print_messages(explain_messages)
explanation_response = client.chat.completions.create(model=explain_model,
messages=explain_messages,
temperature=temperature,
stream=True)
explanation = ""
for chunk in explanation_response:
delta = chunk.choices[0].delta
if print_text:
print_message_delta(delta)
if "content" in delta:
explanation += delta.content
explain_assistant_message = {"role": "assistant", "content": explanation}
# Asks GPT to plan out cases the units tests should cover, formatted as a bullet list
plan_user_message = {
"role": "user",
"content": f"""A good unit test suite should aim to:
- Test the function's behavior for a wide range of possible inputs
- Test edge cases that the author may not have foreseen
- Take advantage of the features of `{unit_test_package}` to make the tests easy to write and maintai
- Be easy to read and understand, with clean code and descriptive names
- Be deterministic, so that the tests always pass or fail in the same way
To help unit test the function above, list diverse scenarios that the function should be able to hand
}
plan_messages = [
explain_system_message,
explain_user_message,
explain_assistant_message,
plan_user_message,
]
if print_text:
print_messages([plan_user_message])
plan_response = client.chat.completions.create(model=plan_model,
messages=plan_messages,
temperature=temperature,
stream=True)
plan = ""
for chunk in plan_response:
delta = chunk.choices[0].delta
if print_text:
print_message_delta(delta)
if "content" in delta:
explanation += delta.content
plan_assistant_message = {"role": "assistant", "content": plan}
```python
# imports
import {unit_test_package} # used for our unit tests
{{insert other imports as needed}}
# function to test
{function_to_test}
# unit tests
{package_comment}
{{insert unit test code here}}
```""",
}
execute_messages = [
execute_system_message,
explain_user_message,
explain_assistant_message,
plan_user_message,
plan_assistant_message,
]
if elaboration_needed:
execute_messages += [elaboration_user_message, elaboration_assistant_message]
execute_messages += [execute_user_message]
if print_text:
print_messages([execute_system_message, execute_user_message])
execute_response = client.chat.completions.create(model=execute_model,
messages=execute_messages,
temperature=temperature,
stream=True)
execution = ""
for chunk in execute_response:
delta = chunk.choices[0].delta
if print_text:
print_message_delta(delta)
if delta.content:
execution += delta.content
words = text.lower().split()
translated_words = [translate(word) for word in words]
return ' '.join(translated_words)
"""
unit_tests = unit_tests_from_function(
example_function,
approx_min_cases_to_cover=10,
print_text=True
)
[0m
[system]
You are a world-class Python developer with an eagle eye for unintended bugs and edge cases. Yo
[0m
[user]
Please explain the following Python function. Review what each element of the function is doing
```python
def pig_latin(text):
def translate(word):
vowels = 'aeiou'
if word[0] in vowels:
return word + 'way'
else:
consonants = ''
for letter in word:
if letter not in vowels:
consonants += letter
else:
break
return word[len(consonants):] + consonants + 'ay'
words = text.lower().split()
translated_words = [translate(word) for word in words]
return ' '.join(translated_words)
```
[0m
[user]
print(unit_tests)
# imports
import pytest
# function to test
def pig_latin(text):
def translate(word):
vowels = 'aeiou'
if word[0] in vowels:
return word + 'way'
else:
consonants = ''
for letter in word:
if letter not in vowels:
consonants += letter
else:
break
return word[len(consonants):] + consonants + 'ay'
words = text.lower().split()
translated_words = [translate(word) for word in words]
return ' '.join(translated_words)
# unit tests
@pytest.mark.parametrize('text, expected', [
('hello world', 'ellohay orldway'), # basic test case
('Python is awesome', 'ythonPay isway awesomeway'), # test case with multiple words
('apple', 'appleway'), # test case with a word starting with a vowel
('' '') # t t ith t t i
Make sure to check any code before using it, as GPT makes plenty of mistakes (especially on
character-based tasks like this one). For best results, use the most powerful model (GPT-4, as of
May 2023).
Cookbook About API Docs Contribute
This notebook guides you step by step on using AnalyticDB as a vector database for OpenAI
embeddings.
4. Using AnalyticDB to perform the nearest neighbour search in the created collection.
What is AnalyticDB
AnalyticDB is a high-performance distributed vector database. Fully compatible with
PostgreSQL syntax, you can effortlessly utilize it. AnalyticDB is Alibaba Cloud managed cloud-
native database with strong-performed vector compute engine. Absolute out-of-box experience
allow to scale into billions of data vectors processing with rich features including indexing
algorithms, structured & non-structured data features, realtime update, distance metrics, scalar
filtering, time travel searches etc. Also equipped with full OLAP database functionality and SLA
commitment for production usage promise;
Deployment options
Using AnalyticDB Cloud Vector Database. Click here to fast deploy it.
Prerequisites
For the purposes of this exercise we need to prepare a couple of things:
2. The 'psycopg2' library to interact with the vector database. Any other postgresql client
library is ok.
We might validate if the server was launched successfully by running a simple curl command:
Install requirements
This notebook obviously requires the openai and psycopg2 packages, but there are also some
other additional libraries we will use. The following command installs them all:
The OpenAI API key is used for vectorization of the documents and queries.
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY .
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
import os
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
Connect to AnalyticDB
First add it to your environment variables. or you can just change the "psycopg2.connect"
parameters below
Connecting to a running instance of AnalyticDB server is easy with the official Python library:
import os
import psycopg2
# Note. alternatively you can set a temporary env variable like this:
# os.environ["PGHOST"] = "your_host"
# os.environ["PGPORT"] "5432"),
# os.environ["PGDATABASE"] "postgres"),
# os.environ["PGUSER"] "user"),
# os.environ["PGPASSWORD"] "password"),
connection = psycopg2.connect(
host=os.environ.get("PGHOST", "localhost"),
port=os.environ.get("PGPORT", "5432"),
database=os.environ.get("PGDATABASE", "postgres"),
user=os.environ.get("PGUSER", "user"),
password=os.environ.get("PGPASSWORD", "password")
)
import wget
embeddings_url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
'vector_database_wikipedia_articles_embedded.zip'
import zipfile
import os
import re
import tempfile
current_directory = os.getcwd()
zip_file_path = os.path.join(current_directory, "vector_database_wikipedia_articles_embedded.zip")
output_directory = os.path.join(current_directory, "../../data")
if os.path.exists(file_path):
print(f"The file {file_name} exists in the data directory.")
else:
print(f"The file {file_name} does not exist in the data directory.")
AnalyticDB stores data in relation where each object is described by at least one vector. Our
relation will be called articles and each object will be described by both title and content
vectors. \
We will start with creating a relation and create a vector index on both title and content, and
then we will fill it with our precomputed embeddings.
create_table_sql = '''
CREATE TABLE IF NOT EXISTS public.articles (
id INTEGER NOT NULL,
url TEXT,
title TEXT,
content TEXT,
title_vector REAL[],
content_vector REAL[],
vector_id INTEGER
);
CREATE INDEX ON public.articles USING ann (title_vector) WITH (distancemeasure = l2, dim = '1536', pq
'''
Load data
In this section we are going to load the data prepared previous to this session, so you don't
have to recompute the embeddings of Wikipedia articles with your own credits.
import io
# Check the collection size to make sure all the points have been stored
count_sql = """select count(*) from public.articles;"""
cursor.execute(count_sql)
result = cursor.fetchone()
print(f"Count:{result[0]}")
Count:25000
Search data
Once the data is put into Qdrant we will start querying the collection for the closest vectors. We
may provide an additional parameter vector_name to switch from title to content based search.
Since the precomputed embeddings were created with text-embedding-3-small OpenAI model
we also have to use it during search.
return results
import openai
Redis
Sam Partee
Open in Github
Feb 12, 2023
What is Redis?
Most developers from a web services background are probably familiar with Redis. At it's core,
Redis is an open-source key-value store that can be used as a cache, message broker, and
database. Developers choice Redis because it is fast, has a large ecosystem of client libraries,
and has been deployed by major enterprises for years.
In addition to the traditional uses of Redis. Redis also provides Redis Modules which are a way
to extend Redis with new capabilities, commands and data types. Example modules include
RedisJSON, RedisTimeSeries, RedisBloom and RediSearch.
Deployment options
There are a number of ways to deploy Redis. For local development, the quickest method is to
use the Redis Stack docker container which we will use here. Redis Stack contains a number of
Redis modules that can be used together to create a fast, multi-model data store and query
engine.
For production use cases, The easiest way to get started is to use the Redis Cloud service. Redis
Cloud is a fully managed Redis service. You can also deploy Redis on your own infrastructure
using Redis Enterprise. Redis Enterprise is a fully managed Redis service that can be deployed in
kubernetes, on-premises or in the cloud.
Additionally, every major cloud provider (AWS Marketplace, Google Marketplace, or Azure
Marketplace) offers Redis Enterprise in a marketplace offering.
What is RediSearch?
RediSearch is a Redis module that provides querying, secondary indexing, full-text search and
vector search for Redis. To use RediSearch, you first declare indexes on your Redis data. You can
then use the RediSearch clients to query that data. For more information on the feature set of
RediSearch, see the RediSearch documentation.
Features
RediSearch uses compressed, inverted indexes for fast indexing with a low memory footprint.
RediSearch indexes enhance Redis by providing exact-phrase matching, fuzzy search, and
numeric filtering, among many other features. Such as:
Field weighting
Given the large ecosystem around Redis, there are most likely client libraries in the language
you need. You can use any standard Redis client library to run RediSearch commands, but it's
easiest to use a library that wraps the RediSearch API. Below are a few examples, but you can
find more client libraries here.
Deployment Options
There are many ways to deploy Redis with RediSearch. The easiest way to get started is to use
Docker, but there are are many potential options for deployment such as
Redis Cloud
Docker (RediSearch)
RediSearch has a distributed cluster version that scales to billions of documents across
hundreds of servers. At the moment, distributed RediSearch is available as part of Redis
Enterprise Cloud and Redis Enterprise Software.
Examples
Product Search - eCommerce product search (with image and text)
Redis VSS in RecSys - 3 end-to-end Redis & NVIDIA Merlin Recommendation System
Architectures.
Azure OpenAI Embeddings Q&A - OpenAI and Redis as a Q&A service on Azure.
More Resources
For more information on how to use Redis as a vector database, check out the following
resources:
Redis Vector Similarity Docs - Redis official docs for Vector Search.
Vector Similarity Search: From Basics to Production - Introductory blog post to VSS and
Redis as a VectorDB.
AI-Powered Document Search - Blog post covering AI Powered Document Search Use
Cases & Architectures.
New APIs
Assistants API Overview (Python SDK) Creating slides with the Assistants API
and DALL-E3
Using logprobs
Popular
How to call functions with chat models How to count tokens with tiktoken
Featured
How to build an agent with the Node.js Related resources from around the web
SDK
Creating slides with the Assistants API and DALL-E3 ASSISTANTS DALL-EDec 8, 2023
Semantic search using MongoDB Atlas Vector Search and OpenAI COMPLETIONS EMBEDDINGSNov 21, 2023
Processing and narrating a video with GPT's visual capabilities and… COMPLETIONS SPEECH VISIONNov 6, 2023
What's new with DALL·E-3? DALL-ENov 6, 2023
How to make your completions outputs consistent with the new seed parameter COMPLETIONSNov 6, 2023
How to build an agent with the Node.js SDK COMPLETIONS FUNCTIONSOct 5, 2023
OpenAI API Monitoring with Weights & Biases Weave COMPLETIONS TIKTOKENOct 4, 2023
Question Answering with LangChain, Deep Lake, & OpenAI EMBEDDINGSSep 30, 2023
How to automate AWS tasks with function-calling COMPLETIONS EMBEDDINGS FUNCTIONSSep 27, 2023
Azure chat completion models with your own data (preview) COMPLETIONSSep 11, 2023
Azure AI Search as a vector database for OpenAI embeddings EMBEDDINGSSep 11, 2023
Using Tair as a vector database for OpenAI embeddings EMBEDDINGSSep 11, 2023
Question Answering with Langchain, Tair and OpenAI COMPLETIONS EMBEDDINGS TIKTOKENSep 11, 2023
Fine-Tuning for Retrieval Augmented Generation (RAG) with Qdrant COMPLETIONS EMBEDDINGSSep 4, 2023
What makes documentation good Sep 1, 2023
Philosophy with Vector Embeddings, OpenAI and Cassandra / Astra DB COMPLETIONS EMBEDDINGSAug 29, 2023
Philosophy with Vector Embeddings, OpenAI and Cassandra / Astra DB COMPLETIONS EMBEDDINGSAug 29, 2023
Retrieval augmented generation using Elasticsearch and OpenAI COMPLETIONS EMBEDDINGSAug 29, 2023
Semantic search using Elasticsearch and OpenAI COMPLETIONS EMBEDDINGSAug 29, 2023
Data preparation and analysis for chat model fine-tuning COMPLETIONS TIKTOKENAug 22, 2023
Function calling for nearby places: Leveraging the Google Places API… COMPLETIONS FUNCTIONSAug 11, 2023
Enhancing Whisper transcriptions: pre- & post-processing techniques WHISPERAug 11, 2023
Using PolarDB-PG as a vector database for OpenAI embeddings EMBEDDINGSJul 11, 2023
Question answering using a search API and re-ranking COMPLETIONS EMBEDDINGSJun 16, 2023
How to use functions with a knowledge base COMPLETIONS FUNCTIONSJun 14, 2023
How to call functions with chat models COMPLETIONS FUNCTIONSJun 13, 2023
Using Weaviate with Generative OpenAI module for Generative Search COMPLETIONS EMBEDDINGSMay 22, 2023
Unit test writing using a multi-step prompt (with the older API) COMPLETIONSMay 19, 2023
How to create dynamic masks with DALL·E and Segment Anything DALL-EMay 19, 2023
Using Hologres as a vector database for OpenAI embeddings EMBEDDINGSMay 19, 2023
Running Hybrid VSS Queries with Redis and OpenAI EMBEDDINGSMay 11, 2023
Redis as a Context Store with OpenAI Chat COMPLETIONS EMBEDDINGSMay 11, 2023
Question Answering with Langchain, AnalyticDB and OpenAI EMBEDDINGS TIKTOKENMay 5, 2023
Robust Question Answering with Chroma and OpenAI COMPLETIONS EMBEDDINGSApr 6, 2023
Retrieval Augmentation for GPT-4 using Pinecone COMPLETIONS EMBEDDINGS TIKTOKENMar 24, 2023
Using Qdrant as a vector database for OpenAI embeddings EMBEDDINGSFeb 16, 2023
Question Answering with Langchain, Qdrant and OpenAI EMBEDDINGSFeb 16, 2023
Using Weaviate with OpenAI vectorize module for Embeddings Search EMBEDDINGSFeb 13, 2023
Using Weaviate with OpenAI vectorize module for Hybrid Search EMBEDDINGSFeb 13, 2023
Question Answering in Weaviate with OpenAI Q&A module COMPLETIONS EMBEDDINGSFeb 13, 2023
Retrieval Augmented Generative Question Answering with Pinecone COMPLETIONS EMBEDDINGSFeb 7, 2023
Visualizing embeddings in Weights and Biases EMBEDDINGSFeb 1, 2023
Related resources from around the web COMPLETIONS EMBEDDINGSJan 20, 2023
Embedding texts that are longer than the model's maximum context… EMBEDDINGS TIKTOKENJan 18, 2023
Recommendation using embeddings and nearest neighbor search EMBEDDINGSMar 10, 2022
Translate a book writen in LaTeX from Slovenian into English COMPLETIONS TIKTOKENMar 10, 2022
This notebook covers how to fine-tune to increase function calling accuracy and reliability.
You can find more information on function calling here, and on fine tuning here
“ tools is an optional parameter in the Chat Completion API which can be used to provide
function specifications. The purpose of this is to enable models to generate function
arguments which adhere to the provided specifications. Note that the API will not actually
execute any function calls. It is up to developers to execute function calls using model
outputs.”
Function calling is a very powerful tool when it functions as intended. However, we have seen
that as the number of
functions increases, and the complexity of the task at hand increases, function calling becomes
less accurate (e.g.: more hallucinated invocations, and incorrect invocations).
Before fine tuning for function calling, it's best to begin with:
Improvements to the function definitions. Make them more clear, and more distinct from
one another.
Experiment with prompt engineering: often a more detailed prompt can help the model call
the correct function.
If the steps above fail to improve function calling to a satisfactory level, then you can try fine
tuning for function calling.
Overview
Generating synthetic data: Using gpt-4 to create 'golden' set of prompts and function
invocations to use as training data
Fine-tuning: Running the fine tuning job, and evaluating the fine-tuned model
Note: This notebook provides an example of how to create synthetic training data for fine tuning
for function calling given just a list of functions. While real-world production test evals are
preferable, this method produces strong results and can be used in conjuction with real-world
training data.
import numpy as np
import json
import os
from openai import OpenAI
import itertools
from tenacity import retry, wait_random_exponential, stop_after_attempt
from typing import Any, Dict, List, Generator
import ast
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
Utilities
Let's define utility functions for making calls to the Chat Completions API, one to get the
completion and one to get the function call.
def get_chat_completion(
messages: list[dict[str, str]],
model: str = "gpt-3.5-turbo",
max_tokens=500,
temperature=1.0,
stop=None,
tools=None,
) -> str:
params = {
'model': model,
'messages': messages,
'max_tokens': max_tokens,
'temperature': temperature,
'stop': stop,
'tools': tools,
}
completion = client.chat.completions.create(**params)
return completion.choices[0].message
Baseline testing
Let's build an intelligent drone co-pilot. We want to be able to give the co-pilot commands, and
have it either call the function for that command, or deny that request if the command is
unfeasible. We can first define a system prompt for the copilot.
DRONE_SYSTEM_PROMPT = """You are an intelligent AI that controls a drone. Given a command or request
call one of your functions to complete the request. If the request cannot be completed by your availa
If the request is ambiguous or unclear, reject the request."""
Now let's define functions for all of the actions the copilot can take.
function_list = [
{
"type": "function",
"function": {
"name": "takeoff_drone",
"description": "Initiate the drone's takeoff sequence.",
"parameters": {
"type": "object",
"properties": {
"altitude": {
"type": "integer",
"description": "Specifies the altitude in meters to which the drone should as
}
},
"required": ["altitude"],
},
},
},
{
"type": "function",
"function": {
"name": "land_drone",
"description": "Land the drone at its current location or a specified landing point.",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"enum": ["current", "home_base", "custom"],
"description": "Specifies the landing location for the drone.",
},
"coordinates": {
"type": "object",
"description": "GPS coordinates for custom landing location. Required if loca
},
},
"required": ["location"],
},
},
},
{
"type": "function",
"function": {
"name": "control_drone_movement",
"description": "Direct the drone's movement in a specific direction.",
"parameters": {
"type": "object",
"properties": {
"direction": {
"type": "string",
"enum": ["forward", "backward", "left", "right", "up", "down"],
"description": "Direction in which the drone should move.",
},
"distance": {
"type": "integer",
"description": "Distance in meters the drone should travel in the specified d
},
},
"required": ["direction", "distance"],
},
},
},
{
"type": "function",
"function": {
"name": "set_drone_speed",
"description": "Adjust the speed of the drone.",
"parameters": {
"type": "object",
"properties": {
"speed": {
"type": "integer",
"description": "Specifies the speed in km/h.",
}
},
"required": ["speed"],
},
},
},
{
"type": "function",
"function": {
"name": "control_camera",
"description": "Control the drone's camera to capture images or videos.",
"parameters": {
"type": "object",
"properties": {
"mode": {
"type": "string",
"enum": ["photo", "video", "panorama"],
"description": "Camera mode to capture content.",
},
"duration": {
"type": "integer",
"description": "Duration in seconds for video capture. Required if mode is 'v
},
},
"required": ["mode"],
},
},
},
{
"type": "function",
"function": {
"name": "control_gimbal",
"description": "Adjust the drone's gimbal for camera stabilization and direction.",
"parameters": {
"type": "object",
"properties": {
"tilt": {
"type": "integer",
"description": "Tilt angle for the gimbal in degrees.",
},
"pan": {
"type": "integer",
"description": "Pan angle for the gimbal in degrees.",
},
},
"required": ["tilt", "pan"],
},
},
},
{
"type": "function",
"function": {
"name": "set_drone_lighting",
"description": "Control the drone's lighting for visibility and signaling.",
"parameters": {
"type": "object",
"properties": {
"mode": {
"type": "string",
"enum": ["on", "off", "blink", "sos"],
"description": "Lighting mode for the drone.",
}
},
"required": ["mode"],
},
},
},
{
"type": "function",
"function": {
"name": "return_to_home",
"description": "Command the drone to return to its home or launch location.",
"parameters": {"type": "object", "properties": {}},
},
},
{
"type": "function",
"function": {
"name": "set_battery_saver_mode",
"description": "Toggle battery saver mode.",
"parameters": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["on", "off"],
"description": "Toggle battery saver mode.",
}
},
"required": ["status"],
},
},
},
{
"type": "function",
"function": {
"name": "set_obstacle_avoidance",
"description": "Configure obstacle avoidance settings.",
"parameters": {
"type": "object",
"properties": {
"mode": {
"type": "string",
"enum": ["on", "off"],
"description": "Toggle obstacle avoidance.",
}
},
"required": ["mode"],
},
},
},
{
"type": "function",
"function": {
"name": "set_follow_me_mode",
"description": "Enable or disable 'follow me' mode.",
"parameters": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["on", "off"],
"description": "Toggle 'follow me' mode.",
}
},
"required": ["status"],
},
},
},
{
"type": "function",
"function": {
"name": "calibrate_sensors",
"description": "Initiate calibration sequence for drone's sensors.",
"parameters": {"type": "object", "properties": {}},
},
},
{
"type": "function",
"function": {
"name": "set_autopilot",
"description": "Enable or disable autopilot mode.",
"parameters": {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["on", "off"],
"description": "Toggle autopilot mode.",
}
},
"required": ["status"],
},
},
},
{
"type": "function",
"function": {
"name": "configure_led_display",
"description": "Configure the drone's LED display pattern and colors.",
"parameters": {
"type": "object",
"properties": {
"pattern": {
"type": "string",
"enum": ["solid", "blink", "pulse", "rainbow"],
"description": "Pattern for the LED display.",
},
"color": {
"type": "string",
"enum": ["red", "blue", "green", "yellow", "white"],
"description": "Color for the LED display. Not required if pattern is 'rainbo
},
},
"required": ["pattern"],
},
},
},
{
"type": "function",
"function": {
"name": "set_home_location",
"description": "Set or change the home location for the drone.",
"parameters": {
"type": "object",
"properties": {
"coordinates": {
"type": "object",
"description": "GPS coordinates for the home location.",
}
},
"required": ["coordinates"],
},
},
},
{
"type": "function",
"function": {
"name": "reject_request",
"description": "Use this function if the request is not possible.",
"parameters": {"type": "object", "properties": {}},
},
},
]
For starters, let's see how function calling performs with some straight forward feasible
prompts, and then one obviously impossible request which call the 'reject_request' function.
Nice! The model performs quite well with these requests. Now let's try some more difficult
requests: requests that are almost feasible and are drone-related, but that the drone cannot
actually do, and the pilot should reject.
We want to generate every invocation of every function, so that we have full coverage of all
potential invocations to create synthetic data for. Then, we will use gpt-4 to come up with
prompts that would call each invocation, and we will use that prompt - function invocation pair
as training data.
Generating every invocation for a function with fixed enums is more simple, but for a function
such as control_gimbal we need to set the tilt and pan integer values, so to generate
those synthetic invocations we will first set a placeholder, and then later use gpt-4 to come up
with reasonable values.
placeholder_int = 'fill_in_int'
placeholder_string = 'fill_in_string'
The functions below take in all the functions from the function list, and look at all the potential
invocations of those functions given each function's parameters. The functions also account for
required parameters, so that all the invocations are actually feasible.
def generate_permutations(params: Dict[str, Dict[str, Any]]) -> Generator[Dict[str, Any], None, None]
"""
Generates all possible permutations for given parameters.
# Determine the fields that are optional by subtracting the base permutation's fields from all pr
optional_fields = set(params['properties']) - set(base_perm)
# Create a new permutation by combining base permutation and current field values
new_perm = {**base_perm, **dict(zip(field_subset, values))}
yield new_perm
# Based on the field's type or presence of 'enum', determine and return the possible values
if 'enum' in field_info:
return field_info['enum']
elif field_info['type'] == 'integer':
return [placeholder_int]
elif field_info['type'] == 'string':
return [placeholder_string]
elif field_info['type'] == 'boolean':
return [True, False]
elif field_info['type'] == 'array' and 'enum' in field_info['items']:
enum_values = field_info['items']['enum']
all_combinations = [list(combo) for i in range(1, len(enum_values) + 1) for combo in itertool
return all_combinations
return []
Prompts:
INVOCATION_FILLER_PROMPT = """
1) Input reasonable values for 'fill_in_string' and 'fill_in_int' in the invocation here: {invocation
the entire function provided here :{function} to get context over what proper fill_in_string and fill
Example:
Input: invocation: {{
"name": "control_camera",
"arguments": {{
"mode":"video",
"duration":"fill_in_int"
}}
}},
function:{function}
Output: invocation: {{
"name": "control_camera",
"arguments": {{
"mode":"video",
"duration": 30
}}
}}
MAKE SURE output is just a dictionary with keys 'name' and 'arguments', no other text or response.
Input: {invocation}
Output:
"""
COMMAND_GENERATION_PROMPT= """
You are to output 2 commands, questions or statements that would generate the inputted function and p
Please make the commands or questions natural, as a person would ask, and the command or questions sh
It should not always mirror the exact technical terminology used in the function and parameters, rath
For instance, the prompt should not be 'turn on the dome light', as that is too technical, but rather
Another example, is the prompt should not be 'turn on the HVAC', but rather 'turn on the air conditio
it is technically incorrect but colloquially used.
RULES: ALWAYS put a backwards slash before an apostrophe or single quote '. For example, do not say d
Prompts MUST be in double quotes as well.
Example
Input: {invocation}
Prompt:
"""
In the below snippet, we generate the invocation of each function except for the
rejection_request function.
To perform effective fine-tuning we need correctly labeled data. We could manually come up
with examples and label the data,
or we can generate synthetic data with the help of gpt-4 Empirically, gpt-4 needs a bit more
help to get good realistic examples of prompts that would generate the reject_request function,
so we'll do that next...
input_objects = []
all_but_reject = [f for f in function_list if f.get('name') != 'reject_request']
input_objects.append(input_object)
Now that we have all the invocations, let's use gpt-4 to generate prompts that would result in
those invocations
def create_commands(invocation_list):
example_list = []
for i, invocation in enumerate(invocation_list):
print(f'\033[34m{np.round(100*i/len(invocation_list),1)}% complete\033[0m')
print(invocation)
training_examples_unformatted = create_commands(input_objects)
[34m0.0% complete[0m
{'name': 'takeoff_drone', 'arguments': {'altitude': 100}}
[34m1.8% complete[0m
{'name': 'land_drone', 'arguments': {'location': 'current'}}
[34m3.5% complete[0m
{'name': 'land_drone', 'arguments': {'location': 'home_base'}}
[34m5.3% complete[0m
{'name': 'land_drone', 'arguments': {'location': 'custom'}}
[34m7.0% complete[0m
{'name': 'control_drone_movement', 'arguments': {'direction': 'forward', 'distance': 50}}
[34m8.8% complete[0m
{'name': 'control_drone_movement', 'arguments': {'direction': 'backward', 'distance': 10}}
[34m10.5% complete[0m
{'name': 'control_drone_movement', 'arguments': {'direction': 'left', 'distance': 10}}
[34m12.3% complete[0m
{'name': 'control_drone_movement', 'arguments': {'direction': 'right', 'distance': 10}}
[34m14.0% complete[0m
{'name': 'control_drone_movement', 'arguments': {'direction': 'up', 'distance': 20}}
[34m15.8% complete[0m
{'name': 'control_drone_movement', 'arguments': {'direction': 'down', 'distance': 10}}
[34m17.5% complete[0m
{'name': 'set_drone_speed', 'arguments': {'speed': 20}}
[34m19.3% complete[0m
{'name': 'control_camera', 'arguments': {'mode': 'photo'}}
[34m21.1% complete[0m
{'name': 'control_camera', 'arguments': {'mode': 'photo', 'duration': 0}}
[34m22.8% complete[0m
{'name': 'control_camera', 'arguments': {'mode': 'video'}}
[34 24 6% l t [0
Now let's format the training examples properly. For more documentation on the proper
training data formatting for fine tuning for function calling, see here:
https://platform.openai.com/docs/guides/fine-tuning/fine-tuning-examples
training_examples = []
for prompt in training_examples_unformatted:
#adjust formatting for training data specs
try:
prompt["Input"] = ast.literal_eval(prompt["Input"])
except:
continue
prompt['Input']['arguments']=json.dumps(prompt['Input']['arguments'])
for p in prompt['Prompt']:
training_examples.append({"messages": [{"role":"system","content":DRONE_SYSTEM_PROMPT
},{"role":"user","content": p},
{"role":"assistant","function_call": prompt['Input']}],
"functions":[func['function'] for func in function_list]})
Now, back to the rejection function. Let's generate some prompts that are nearly possible, but
should result in the decline_request function being called. To do so, we queried gpt-4 asking
for requests that are related to, but not quite possible with, the given list of functions.
reject_training_list = []
for prompt in reject_list:
#Adjust formatting
reject_training_list.append({"messages": [{"role":"system","content":DRONE_SYSTEM_PROMPT
},{"role":"user","content": prompt},
{"role":"assistant","function_call": {"name": "reject_request","arguments": "
"functions":[func['function'] for func in function_list]})
training_list_total = training_examples+reject_training_list
training_file = 'data/drone_training.jsonl'
with open(training_file, 'w') as f:
for item in training_list_total:
json_str = json.dumps(item)
f.write(f'{json_str}\n')
Fine tuning
Finally, we can kick off the fine-tuning job
if __name__ == "__main__":
file = client.files.create(
file=open(training_file, "rb"),
purpose="fine-tune",
)
file_id = file.id
print(file_id)
ft = client.fine_tuning.jobs.create(
model="gpt-3.5-turbo",
training_file=file_id,
)
file-CGMggG5iZYKTocwgCp7kV7C6
Evaluations
Great! We trained a fine-tuned model for function calling. Let's see how it does on our
evaluation set for prompts that the drone assistant should automatically reject.
for eval_question in challenging_prompts:
messages = []
messages.append({"role": "system", "content": DRONE_SYSTEM_PROMPT})
messages.append({"role": "user", "content": eval_question})
completion = get_chat_completion(model="ft:gpt-3.5-turbo-0613:openai-internal::8DloQKS2",messages=m
print(eval_question)
print(completion.tool_calls[0].function.name,'\n')
Great! While the original model only rejected 1 of the 5 requests, the fine tuned model rejected
all 5 requests.
Conclusion
Congratulations! You are now ready to fine tune your model for function calling. We can't wait
to see what you build.
Classification using embeddings
Ted Sanders, Logan Kilpatrick
Open in Github
Jul 11, 2022
There are many ways to classify text. This notebook shares an example of text classification
using embeddings. For many text classification tasks, we've seen fine-tuned models do better
than embeddings. See an example of fine-tuned models for classification in Fine-
Cookbook About API Docs Contribute
tuned_classification.ipynb. We also recommend having more examples than embedding
dimensions, which we don't quite achieve here.
In this text classification task, we predict the score of a food review (1 to 5) based on the
embedding of the review's text. We split the dataset into a training and a testing set for all the
following tasks, so we can realistically evaluate performance on unseen data. The dataset is
created in the Get_embeddings_from_dataset Notebook.
import pandas as pd
import numpy as np
from ast import literal_eval
datafile_path = "data/fine_food_reviews_with_embeddings_1k.csv"
df = pd.read_csv(datafile_path)
df["embedding"] = df.embedding.apply(literal_eval).apply(np.array) # convert string to array
We can see that the model has learnt to distinguish between the categories decently. 5-star
reviews show the best performance overall, and this is not too surprising, since they are the
most common in the dataset.
This notebook presents how to implement a Question Answering system with Langchain, Tair as
a knowledge based and OpenAI embeddings. If you are not familiar with Tair, it’s better to check
out the Getting_started_with_Tair_and_OpenAI.ipynb notebook.
Using Tair to perform the nearest neighbour search in the created collection to find some
context.
All the steps will be simplified to calling some corresponding Langchain methods.
Prerequisites
For the purposes of this exercise we need to prepare a couple of things: Tair cloud instance.
Langchain as a framework. An OpenAI API key.
Install requirements
This notebook requires the following Python packages: openai , tiktoken , langchain and
tair .
openai provides convenient access to the OpenAI API.
The OpenAI API key is used for vectorization of the documents and queries.
If you don't have an OpenAI API key, you can get one from
[https://platform.openai.com/account/api-keys ).
Load data
In this section we are going to load the data containing some natural questions and answers to
them. All the data will be used to create a Langchain application with Tair being the knowledge
base.
import wget
'answers (2).json'
import json
print(questions[0])
print(answers[0])
No . overall No. in season Title Directed by Written by Original air date U.S. viewers ( millio
Chain definition
Langchain is already integrated with Tair and performs all the indexing for given list of
documents. In our case we are going to store the set of answers we have.
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
doc_store = Tair.from_texts(
texts=answers, embedding=embeddings, tair_url=TAIR_URL,
)
At this stage all the possible answers are already stored in Tair, so we can define the whole QA
chain.
llm = OpenAI(openai_api_key=openai_api_key)
qa = VectorDBQA.from_chain_type(
llm=llm,
chain_type="stuff",
vectorstore=doc_store,
return_source_documents=False,
)
/root/anaconda3/envs/notebook/lib/python3.10/site-packages/langchain/chains/retrieval_qa/base.p
warnings.warn(
Search data
Once the data is put into Tair we can start asking some questions. A question will be
automatically vectorized by OpenAI model, and the created vector will be used to find some
possibly matching answers in Tair. Once retrieved, the most similar answers will be incorporated
into the prompt sent to OpenAI Large Language Model.
import random
random.seed(52)
selected_questions = random.choices(questions, k=5)
import time
for question in selected_questions:
print(">", question)
print(qa.run(question), end="\n\n")
# wait 20seconds because of the rate limit
time.sleep(20)
Use the following pieces of context to answer the question at the end. If you don't know the answer
{context}
Question: {question}
Helpful Answer:
We can, however, provide our prompt template and change the behaviour of the OpenAI LLM,
while still using the stuff chain type. It is important to keep {context} and {question} as
placeholders.
2. Suggests a random song title if it doesn't know the answer to our question.
custom_prompt_template = PromptTemplate(
template=custom_prompt, input_variables=["context", "question"]
)
custom_qa = VectorDBQA.from_chain_type(
llm=llm,
chain_type="stuff",
vectorstore=doc_store,
return_source_documents=False,
chain_type_kwargs={"prompt": custom_prompt_template},
)
random.seed(41)
for question in random.choices(questions, k=5):
print(">", question)
print(custom_qa.run(question), end="\n\n")
# wait 20seconds because of the rate limit
time.sleep(20)
> what was uncle jesse's original last name on full house
Uncle Jesse's original last name on Full House was Cochran.
> the first civil service commission in india was set up on the basis of recommendation of
The first Civil Service Commission in India was not set up on the basis of the recommendation o
The prompt is intended to help stitch together multiple audio segments. By submitting the prior
segment's transcript via the prompt, the Whisper model can use that context to better
understand the speech and maintain a consistent writing style.
However, prompts do not need to be genuine transcripts from prior audio segments. Fictitious
prompts can be submitted to steer the model to use particular spellings or styles.
This notebook shares two techniques for using fictitious prompts to steer the model outputs:
Transcript generation: GPT can convert instructions into fictitious transcripts for Whisper to
emulate.
Spelling guide: A spelling guide can tell the model how to spell names of people, products,
companies, etc.
These techniques are not especially reliable, but can be useful in some situations.
Prompting Whisper is not the same as prompting GPT. For example, if you submit an attempted
instruction like "Format lists in Markdown format", the model will not comply, as it follows the
style of the prompt, rather than any instructions contained within.
In addition, the prompt is limited to only 224 tokens. If the prompt is longer than 224 tokens,
only the final 224 tokens of the prompt will be considered; all prior tokens will be silently
ignored. The tokenizer used is the multilingual Whisper tokenizer.
To get good results, craft examples that portray your desired style.
Setup
Import the OpenAI Python library (if you don't have it, you'll need to install it with pip
install openai )
# imports
from openai import OpenAI # for making OpenAI API calls
import urllib # for downloading example audio files
import os
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
Our audio file for this example will be a segment of the NPR podcast, Up First.
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, ju
# lowercase prompt
transcribe(up_first_filepath, prompt="president biden")
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, ju
Be aware that when prompts are short, Whisper may be less reliable at following their style.
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, ju
"i stick contacts in my eyes. do you really? yeah. that works okay? you don't have to, like, ju
"I stick contacts in my eyes. Do you really? Yeah. That works okay. You don't have to like, it'
Whisper may incorrectly transcribe uncommon proper nouns such as names of products,
companies, or people.
'Welcome to Quirk, Quid, Quill, Inc., where finance meets innovation. Explore diverse offerings
To get Whisper to use our preferred spellings, let's pass the product and company names in the
prompt, as a glossary for Whisper to follow.
# adding the correct spelling of the product name helps
transcribe(product_names_filepath, prompt="QuirkQuid Quill Inc, P3-Quattro, O3-Omni, B3-BondX, E3-Equ
'Welcome to QuirkQuid Quill Inc, where finance meets innovation. Explore diverse offerings, fro
Now, let's switch to another audio recording authored specifically for this demonstration, on the
topic of a odd barbecue.
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really excit
While Whisper's transcription was accurate, it had to guess at various spellings. For example, it
assumed the friends' names were spelled Amy and Sean rather than Aimee and Shawn. Let's see
if we can steer the spelling with a prompt.
# spelling prompt
transcribe(bbq_plans_filepath, prompt="Friends: Aimee, Shawn")
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really excit
Success!
"Hello, my name is Preston Tuggle. I'm based in New York City. This weekend I have really excit
One potential tool to generate fictitious prompts is GPT. We can give GPT instructions and use it
to generate long fictitious transcripts with which to prompt Whisper.
# ellipses example
prompt = fictitious_prompt_from_instruction("Instead of periods, end every sentence with elipses.")
print(prompt)
Oh, do you remember that amazing vacation we took to Maine?... The beautiful coastal towns, the
transcribe(up_first_filepath, prompt=prompt)
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, ju
Whisper prompts are best for specifying otherwise ambiguous styles. The prompt will not
override the model's comprehension of the audio. For example, if the speakers are not speaking
in a deep Southern accent, a prompt will not cause the transcript to do so.
Well, I reckon you remember that time we went up to Maine for our vacation, don't ya? Boy, oh b
"I stick contacts in my eyes. Do you really? Yeah. That works okay? You don't have to, like, ju
Cookbook About API Docs Contribute
In this notebook we will learn how to query relevant contexts to our queries from Pinecone, and
pass these to a generative OpenAI model to generate an answer backed by real data sources.
A common problem with using GPT-3 to factually answer questions is that GPT-3 can
sometimes make things up. The GPT models have a broad range of general knowledge, but this
does not necessarily apply to more specific information. For that we use the Pinecone vector
database as our "external knowledge base" — like long-term memory for GPT-3.
import openai
# get API key from top-right dropdown on OpenAI website
openai.api_key = "OPENAI_API_KEY"
For many questions state-of-the-art (SOTA) LLMs are more than capable of answering correctly.
query = "who was the 12th person on the moon and when did they land?"
res['choices'][0]['text'].strip()
'The 12th person on the moon was Harrison Schmitt, and he landed on December 11, 1972.'
However, that isn't always the case. First let's first rewrite the above into a simple function so
we're not rewriting this every time.
def complete(prompt):
res = openai.Completion.create(
engine='gpt-3.5-turbo-instruct',
prompt=prompt,
temperature=0,
max_tokens=400,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
stop=None
)
return res['choices'][0]['text'].strip()
Now let's ask a more specific question about training a type of transformer model called a
sentence transformer. The ideal answer we'd be looking for is "Multiple Negatives Ranking (MNR)
loss".
Don't worry if this is a new term to you, it isn't required to understand what we're doing or
demoing here.
query = (
"Which training method should I use for sentence transformers when " +
"I only have pairs of related sentences?"
)
complete(query)
'If you only have pairs of related sentences, then the best training method to use for sentence
The best training method to use for fine-tuning a pre-trained model with sentence transformers is t
This answer seems pretty convincing right? Yet, it's wrong. MLM is typically used in the
pretraining step of a transformer model but "cannot" be used to fine-tune a sentence-
transformer, and has nothing to do with having "pairs of related sentences".
An alternative answer we receive (and the one we returned above) is about supervised
learning approach being the most suitable. This is completely true, but it's not specific and
We have two options for enabling our LLM in understanding and correctly answering this
question:
1. We fine-tune the LLM on text data covering the topic mentioned, likely on articles and
papers talking about sentence transformers, semantic search training methods, etc.
With option 2 the retrieval of relevant information requires an external "Knowledge Base", a
place where we can store and use to efficiently retrieve information. We can think of this as the
external long-term memory of our LLM.
We will need to retrieve information that is semantically related to our queries, to do this we
need to use "dense vector embeddings". These can be thought of as numerical representations
of the meaning behind our sentences.
We have already authenticated our OpenAI connection, to create an embedding we just do:
embed_model = "text-embedding-ada-002"
res = openai.Embedding.create(
input=[
"Sample document text goes here",
"there will be several phrases in each batch"
], engine=embed_model
)
In the response res we will find a JSON-like object containing our new embeddings within the
'data' field.
res.keys()
Inside 'data' we will find two records, one for each of the two sentences we just embedded.
Each vector embedding contains 1536 dimensions (the output dimensionality of the text-
embedding-3-small model.
len(res['data'])
len(res['data'][0]['embedding']), len(res['data'][1]['embedding'])
(1536, 1536)
We will apply this same embedding logic to a dataset containing information relevant to our
query (and many other queries on the topics of ML and AI).
Data Preparation
Dataset({
features: ['title', 'published', 'url', 'video_id', 'channel_id', 'id', 'text', 'start', 'e
num_rows: 208619
})
data[0]
{'title': 'Training and Testing an Italian BERT - Transformers From Scratch #4',
'published': '2021-07-06 13:00:03 UTC',
'url': 'https://youtu.be/35Pdoyi6ZoQ',
'video_id': '35Pdoyi6ZoQ',
'channel_id': 'UCv83tO5cePwHMt1952IVVHw',
'id': '35Pdoyi6ZoQ-t0.0',
'text': 'Hi, welcome to the video.',
'start': 0.0,
'end': 9.36}
The dataset contains many small snippets of text data. We will need to merge many snippets
from each video to create more substantial chunks of text that contain more information.
new_data = []
new_data[0]
{'start': 0.0,
'end': 74.12,
'title': 'Training and Testing an Italian BERT - Transformers From Scratch #4',
'text': "Hi, welcome to the video. So this is the fourth video in a Transformers from Scratch
'id': '35Pdoyi6ZoQ-t0.0',
'url': 'https://youtu.be/35Pdoyi6ZoQ',
'published': '2021-07-06 13:00:03 UTC',
'channel_id': 'UCv83tO5cePwHMt1952IVVHw'}
Now we need a place to store these embeddings and enable a efficient vector search through
them all. To do that we use Pinecone , we can get a free API key and enter it below where we
will initialize our connection to Pinecone and create a new index.
import pinecone
index_name = 'openai-youtube-transcriptions'
{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {},
'total_vector_count': 0}
We can see the index is currently empty with a total_vector_count of 0 . We can begin
populating it with OpenAI text-embedding-3-small built embeddings like so:
res = openai.Embedding.create(
input=[query],
engine=embed_model
)
res
limit = 3750
def retrieve(query):
res = openai.Embedding.create(
input=[query],
engine=embed_model
)
"Answer the question based on the context below.\n\nContext:\npairs of related sentences you ca
'You should use Natural Language Inference (NLI) with multiple negative ranking loss.'
And we get a pretty great answer straight away, specifying to use multiple-rankings loss (also
called multiple negatives ranking loss).
Semantic search using MongoDB Atlas Vector
Search and OpenAI
Prakul Agarwal
Open in Github
Open Nov 20, 2023
This notebook demonstrates how to build a semantic search application using OpenAI and
MongoDB Atlas vector search
1. MongoDB Atlas cluster: To create a forever free MongoDB Atlas cluster, first, you need to
create a MongoDB Atlas account if you don't already have one. Visit the MongoDB Atlas
website and click on “Register.” Visit the MongoDB Atlas dashboard and set up your
cluster. In order to take advantage of the $vectorSearch operator in an aggregation
pipeline, you need to run MongoDB Atlas 6.0.11 or higher. This tutorial can be built using a
free cluster. When you’re setting up your deployment, you’ll be prompted to set up a
database user and rules for your network connection. Please ensure you save your
username and password somewhere safe and have the correct IP address rules in place so
your cluster can connect properly. If you need more help getting started, check out our
tutorial on MongoDB Atlas.
2. OpenAI API key To create your OpenAI key, you'll need to create an account. Once you
have that, visit the OpenAI platform. Click on your profile icon in the top right of the screen
to get the dropdown menu and select “View API keys”.
import getpass
Note: After executing the step above you will be prompted to enter the credentials.
Cookbook About API Docs Contribute
For this tutorial, we will be using the MongoDB sample dataset. Load the sample dataset using
the Atlas UI. We'll be using the “sample_mflix” database, which contains a “movies” collection
where each document contains fields like title, plot, genres, cast, directors, etc.
import openai
import pymongo
client = pymongo.MongoClient(MONGODB_ATLAS_CLUSTER_URI)
db = client.sample_mflix
collection = db.movies
openai.api_key = OPENAI_API_KEY
ATLAS_VECTOR_SEARCH_INDEX_NAME = "default"
EMBEDDING_FIELD_NAME = "embedding_openai_nov19_23"
collection.bulk_write(requests)
After executing the above, the documents in "movies" collection will contain an additional field
of "embedding", as defined by the EMBEDDDING_FIELD_NAME variable, apart from already existing
fields like title, plot, genres, cast, directors, etc.
Note: We are restricting this to just 500 documents in the interest of time. If you want to do this
over the entire dataset of 23,000+ documents in our sample_mflix database, it will take a little
while. Alternatively, you can use the sample_mflix.embedded_movies collection which includes
a pre-populated plot_embedding field that contains embeddings created using OpenAI's text-
embedding-3-small embedding model that you can use with the Atlas Search vector search
feature.
Now head over to Atlas UI and create an Atlas Vector Search index using the steps descibed
here. The 'dimensions' field with value 1536, corresponds to openAI text-embedding-ada002.
Use the definition given below in the JSON editor on the Atlas UI.
{
"mappings": {
"dynamic": true,
"fields": {
"embedding": {
"dimensions": 1536,
"similarity": "dotProduct",
"type": "knnVector"
}
}
}
}
(Optional) Alternatively, we can use pymongo driver to create these vector search indexes
programatically The python command given in the cell below will create the index (this only
works for the most recent version of the Python Driver for MongoDB and MongoDB server
version 7.0+ Atlas cluster).
collection.create_search_index(
{"definition":
{"mappings": {"dynamic": True, "fields": {
EMBEDDING_FIELD_NAME : {
"dimensions": 1536,
"similarity": "dotProduct",
"type": "knnVector"
}}}},
"name": ATLAS_VECTOR_SEARCH_INDEX_NAME
}
)
'default'
This notebook takes you through a simple flow to download some data, embed it, and then
index and search it using a selection of vector databases. This is a common requirement for
customers who want to store and search our embeddings with their own data in a secure
environment to support production use cases such as chatbots, topic modelling and more.
Vector databases enable enterprises to take many of the embeddings use cases we've shared in
this repo (question and answering, chatbot and recommendation services, for example), and
make use of them in a secure, scalable environment. Many of our customers make embeddings
solve their problems at small scale but performance and security hold them back from going
into production - we see vector databases as a key component in solving that, and in this guide
we'll walk through the basics of embedding text data, storing it in a vector database and using it
for semantic search.
Demo Flow
The demo flow is:
Setup: Import packages and set any required variables
Pinecone
Setup: Here we'll set up the Python client for Pinecone. For more details go here
Cookbook About API Docs Contribute
Index Data: We'll create an index with namespaces for titles and content
Search Data: We'll test out both namespaces with search queries to confirm it works
Once you've run through this notebook you should have a basic understanding of how to setup
and use vector databases, and can move on to more complex use cases making use of our
embeddings.
Setup
Import the required libraries and set the embedding model that we'd like to use.
import openai
# I've set this to our new embeddings model, this can be changed to the embedding model of your choic
EMBEDDING_MODEL = "text-embedding-3-small"
# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings
/Users/colin.jarvis/Documents/dev/cookbook/openai-cookbook/vector_db/lib/python3.10/site-packag
from tqdm.autonotebook import tqdm
Load data
In this section we'll load embedded data that we've prepared in this article.
embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
import zipfile
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip","r") as zip_ref:
zip_ref.extractall("../data")
article_df = pd.read_csv('../data/vector_database_wikipedia_articles_embedded.csv')
article_df.head()
id url title text title_vector con
article_df.info(show_counts=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 25000 non-null int64
1 url 25000 non-null object
2 title 25000 non-null object
3 text 25000 non-null object
4 title_vector 25000 non-null object
5 content_vector 25000 non-null object
6 vector_id 25000 non-null object
dtypes: int64(1), object(6)
memory usage: 1.3+ MB
Pinecone
The next option we'll look at is Pinecone, a managed vector database which offers a cloud-
native option.
Before you proceed with this step you'll need to navigate to Pinecone, sign up and then save
your API key as an environment variable titled PINECONE_API_KEY .
Create an index with multiple namespaces for article titles and content
Store our data in the index with separate searchable "namespaces" for article titles and
content
api_key = os.getenv("PINECONE_API_KEY")
pinecone.init(api_key=api_key)
Create Index
First we will need to create an index, which we'll call wikipedia-articles . Once we have an
index, we can create multiple namespaces, which can make a single index searchable for various
use cases. For more details, consult Pinecone documentation.
If you want to batch insert to your index in parallel to increase insertion speed then there is a
great guide in the Pinecone documentation on batch inserts in parallel.
# Models a simple batch generator that make chunks out of an input DataFrame
class BatchGenerator:
__call__ = to_batches
df_batcher = BatchGenerator(300)
# Check whether the index with the same name already exists - if so, delete it
if index_name in pinecone.list_indexes():
pinecone.delete_index(index_name)
['podcasts', 'wikipedia-articles']
# Upsert content vectors in content namespace - this can take a few minutes
print("Uploading vectors to content namespace..")
for batch_df in df_batcher(article_df):
index.upsert(vectors=zip(batch_df.vector_id, batch_df.content_vector), namespace='content')
# Upsert title vectors in title namespace - this can also take a few minutes
print("Uploading vectors to title namespace..")
for batch_df in df_batcher(article_df):
index.upsert(vectors=zip(batch_df.vector_id, batch_df.title_vector), namespace='title')
Uploading vectors to title namespace..
# Check index size for each namespace to confirm all of our docs have loaded
index.describe_index_stats()
{'dimension': 1536,
'index_fullness': 0.1,
'namespaces': {'content': {'vector_count': 25000},
'title': {'vector_count': 25000}},
'total_vector_count': 50000}
Search data
Now we'll enter some dummy searches and check we get decent results back
# First we'll create dictionaries mapping vector IDs to their outputs so we can retrieve the text for
titles_mapped = dict(zip(article_df.vector_id,article_df.title))
content_mapped = dict(zip(article_df.vector_id,article_df.text))
matches = query_result.matches
ids = [res.id for res in matches]
scores = [res.score for res in matches]
df = pd.DataFrame({'id':ids,
'score':scores,
'title': [titles_mapped[_id] for _id in ids],
'content': [content_mapped[_id] for _id in ids],
})
counter = 0
for k,v in df.iterrows():
counter += 1
print(f'{v.title} (score = {v.score})')
print('\n')
return df
The purpose of this guide is to demonstrate how to store OpenAI embeddings in Supabase
Vector (Postgres + pgvector) for the purposes of semantic search.
Supabase adds an ecosystem of services and tools to make app development as quick as
possible (such as an auto-generated REST API). We'll use these services to store and query
embeddings within Postgres.
4. Storing the embeddings in your SQL table using the Supabase JavaScript client
5. Performing semantic search over the embeddings using a Postgres function and the
Supabase JavaScript client
Setup database
First head over to https://database.new to provision your Supabase database. This will create a
Postgres database on the Supabase cloud platform. Alternatively, you can follow the local
development options if you prefer to run your database locally using Docker.
In the studio, jump to the SQL editor and execute the following SQL to enable pgvector:
“In a production application, the best practice is to use database migrations so that all SQL
operations are managed within source control. To keep things simple in this guide, we'll
execute queries directly in the SQL Editor. If you are building a production app, feel free to
move these into a database migration.”
Next we'll create a table to store documents and embeddings. In the SQL Editor, run:
Since Supabase is built on Postgres, we're just using regular SQL here. You can modify this table
however you like to better fit your application. If you have existing database tables, you can
simply add a new vector column to the appropriate table.
The important piece to understand is the vector data type, which is a new data type that
became available when we enabled the pgvector extension earlier. The size of the vector (1536
here) represents the number of dimensions in the embedding. Since we're using OpenAI's
text-embedding-3-small model in this example, we set the vector size to 1536.
Let's go ahead and create a vector index on this table so that future queries remain performant
as the table grows:
Let's also follow security best practices by enabling row level security on the table:
This will prevent unauthorized access to this table through the auto-generated REST API (more
on this shortly).
This guide uses JavaScript to generate embeddings, but you can easily modify it to use any
language supported by OpenAI.
If you are using JavaScript, feel free to use whichever server-side JavaScript runtime that you
prefer (Node.js, Deno, Supabase Edge Functions).
If you're using Deno or Supabase Edge Functions, you can import openai directly from a URL:
“In this example we import from https://esm.sh which is a CDN that automatically fetches
the respective NPM module for you and serves it over HTTP.”
Next we'll generate an OpenAI embedding using text-embedding-3-small :
Remember that you will need an OpenAI API key to interact with the OpenAI API. You can pass
this as an environment variable called OPENAI_API_KEY , or manually set it when you instantiate
your OpenAI client:
Remember: Never hard-code API keys in your code. Best practice is to either store it in a .env file
and load it using a library like dotenv or load it from an external key management system.
Supabase comes with an auto-generated REST API that dynamically builds REST endpoints for
each of your tables. This means you don't need to establish a direct Postgres connection to your
database - instead you can interact with it simply using by the REST API. This is especially useful
in serverless environments that run short-lived processes where re-establishing a database
connection every time can be expensive.
Supabase comes with a number of client libraries to simplify interaction with the REST API. In
this guide we'll use the JavaScript client library, but feel free to adjust this to your preferred
language.
If you're using Node.js, install @supabase/supabase-js as a dependency:
If you're using Deno or Supabase Edge Functions, you can import @supabase/supabase-js
directly from a URL:
Next we'll instantiate our Supabase client and configure it so that it points to your Supabase
project. In this guide we'll store a reference to your Supabase URL and key in a .env file, but
feel free to modify this based on how your application handles configuration.
If you are using Node.js or Deno, add your Supabase URL and service role key to a .env file. If
you are using the cloud platform, you can find these from your Supabase dashboard settings
page. If you're running Supabase locally, you can find these by running npx supabase status in
a terminal.
.env
SUPABASE_URL=<supabase-url>
SUPABASE_SERVICE_ROLE_KEY=<supabase-service-role-key>
If you are using Supabase Edge Functions, these environment variables are automatically
injected into your function for you so you can skip the above step.
In Deno, load the .env file using the dotenv standard library:
In Supabase Edge Functions, simply load the injected environment variables directly:
From here we use the supabase client to insert our text and embedding (generated earlier) into
the database:
const { error } = await supabase.from("documents").insert({
content: input,
embedding,
});
“In production, best practice would be to check the response error to see if there were any
problems inserting the data and handle it accordingly.”
Semantic search
Finally let's perform semantic search over the embeddings in our database. At this point we'll
assume your documents table has been filled with multiple records that we can search over.
Let's create a match function in Postgres that performs the semantic search query. Execute the
following in the SQL Editor:
This function accepts a query_embedding which represents the embedding generated from the
search query text (more on this shortly). It also accepts a match_threshold which specifies how
similar the document embeddings have to be in order for query_embedding to count as a
match.
Inside the function we implement the query which does two things:
Filters the documents to only include those who's embeddings match within the above
match_threshold . Since the <#> operator performs the negative inner product (versus
positive inner product), we negate the similarity threshold before comparing. This means a
match_threshold of 1 is most similar, and -1 is most dissimilar.
Orders the documents by negative inner product ( <#> ) ascending. This allows us to
retrieve documents that match closest first.
“Since OpenAI embeddings are normalized, we opted to use inner product ( <#> ) because it
is slightly more performant than other operators like cosine distance ( <=> ). It is important
to note though this only works because the embeddings are normalized - if they weren't,
cosine distance should be used.”
Now we can call this function from our application using the supabase.rpc() method:
In this example, we set a match threshold to 0.8. Adjust this threshold based on what works best
with your data.
Note that since match_documents returns a set of documents , we can treat this rpc() like a
regular table query. Specifically this means we can chain additional commands to this query, like
select() and limit() . Here we select just the columns we care about from the documents
table ( content ), and we limit the number of documents returned (max 5 in this example).
At this point you have a list of documents that matched the query based on semantic
relationship, ordered by most similar first.
Next steps
You can use this example as the foundation for other semantic search techniques, like retrieval
augmented generation (RAG).
For more information on Supabase Vector, read the AI & Vector docs.
Cookbook About API Docs Contribute
This notebook takes you through how to use LangChain to augment an OpenAI model with
access to external tools. In particular, you'll be able to create LLM agents that use custom tools
to answer user queries.
What is Langchain?
One of the most common challenges with LLMs is overcoming the lack of recency and
specificity in their training data - answers can be out of date, and they are prone to
hallucinations given the huge variety in their knowledge base. Tools are a great method of
allowing an LLM to answer within a controlled context that draws on your existing knowledge
bases and internal APIs - instead of trying to prompt engineer the LLM all the way to your
intended answer, you allow it access to tools that it calls on dynamically for info, parses, and
serves to customer.
Providing LLMs access to tools can enable them to answer questions with context directly from
search engines, APIs or your own databases. Instead of answering directly, an LLM with access
to tools can perform intermediate steps to gather relevant information. Tools can also be used
in combination. For example, a language model can be made to use a search tool to lookup
quantitative information and a calculator to execute calculations.
Notebook Sections
LLM Agent: Build an agent that leverages a modified version of the ReAct framework to do
chain-of-thought reasoning.
LLM Agent with History: Provide the LLM with access to previous steps in the conversation.
Knowledge Base: Create a knowledge base of "Stuff You Should Know" podcast episodes,
to be accessed through a tool.
LLM Agent with Tools: Extend the agent with access to multiple tools and test that it uses
them to answer questions.
%load_ext autoreload
%autoreload 2
Setup
Import libraries and set up a connection to a Pinecone vector database.
You can substitute Pinecone for any other vectorstore or database - there are a selection that
are supported by Langchain natively, while other connectors will need to be developed yourself.
# Langchain imports
from langchain.agents import Tool, AgentExecutor, LLMSingleActionAgent, AgentOutputParser
from langchain.prompts import BaseChatPromptTemplate, ChatPromptTemplate
from langchain import SerpAPIWrapper, LLMChain
from langchain.schema import AgentAction, AgentFinish, HumanMessage, SystemMessage
# LLM wrapper
from langchain.chat_models import ChatOpenAI
from langchain import OpenAI
# Conversational memory
from langchain.memory import ConversationBufferWindowMemory
# Embeddings and vectorstore
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Pinecone
# Vectorstore Index
index_name = 'podcasts'
For acquiring an API key to connect with Pinecone, you can set up a free account and store it in
the api_key variable below or in your environment variables under PINECONE_API_KEY
pinecone.init(api_key=api_key, environment=env)
pinecone.whoami()
pinecone.list_indexes()
['podcasts']
Run this code block if you want to clear the index, or if the index doesn't exist yet
# Check whether the index with the same name already exists - if so, delete it
if index_name in pinecone.list_indexes():
pinecone.delete_index(index_name)
LLM Agent
An LLM agent in Langchain has many configurable components, which are detailed in the
Langchain documentation.
We'll employ a few of the core concepts to make an agent that talks in the way we want, can
use tools to answer questions, and uses the appropriate language model to power the
conversation.
Prompt Template: The input template to control the LLM's behaviour and how it accepts
inputs and produces outputs - this is the brain that drives your application (docs).
Output Parser: A method of parsing the output from the prompt. If the LLM produces
output using certain headers, you can enable complex interactions where variables are
generated by the LLM in their response and passed into the next step of the chain (docs).
LLM Chain: A Chain brings together a prompt template with an LLM that will execute it - in
this case we'll be using gpt-3.5-turbo but this framework can be used with OpenAI
completions models, or other LLMs entirely (docs).
Tool: An external service that the LLM can use to retrieve information or execute commands
should the user require it (docs).
Agent: The glue that brings all of this together, an agent can call multiple LLM Chains, each
with their own tools. Agents can be extended with your own logic to allow retries, error
handling and any other methods you choose to add reliability to your application (docs).
NB: Before using this cookbook with the Search tool you'll need to sign up on
https://serpapi.com/ and generate an API key. Once you have it, store it in an environment
variable named SERPAPI_API_KEY
# Initiate a Search tool - note you'll need to have set SERPAPI_API_KEY as an environment variable as
search = SerpAPIWrapper()
# Set up the prompt with input variables for tools, user input and a scratchpad for the model to reco
template = """Answer the following questions as best you can, but speaking as a pirate might speak. Y
{tools}
Begin! Remember to speak as a pirate when giving your final answer. Use lots of "Arg"s
Question: {input}
{agent_scratchpad}"""
prompt = CustomPromptTemplate(
template=template,
tools=tools,
# This omits the `agent_scratchpad`, `tools`, and `tool_names` variables because those are genera
# This includes the `intermediate_steps` variable because that is needed
input_variables=["input", "intermediate_steps"]
)
class CustomOutputParser(AgentOutputParser):
output_parser = CustomOutputParser()
agent = LLMSingleActionAgent(
llm_chain=llm_chain,
output_parser=output_parser,
# We use "Observation" as our stop sequence so it will stop when it receives Tool output
# If you change your prompt template you'll need to adjust this as well
stop=["\nObservation:"],
allowed_tools=tool_names
)
"Arg, I be sorry matey, but I can't give ye a clear answer to that question."
Extend the LLM Agent with the ability to retain a memory and use it as context as it continues
the conversation.
We use a simple ConversationBufferWindowMemory for this example that keeps a rolling window
of the last two conversation turns. LangChain has other memory options, with different
tradeoffs suitable for different use cases.
{tools}
# Initiate the memory with k=2 to keep the last two turns
# Provide the memory to the agent
memory = ConversationBufferWindowMemory(k=2)
agent_executor = AgentExecutor.from_agent_and_tools(agent=agent, tools=tools, verbose=True, memory=me
Knowledge base
Create a custom vectorstore for the Agent to use as a tool to answer questions with. We'll store
the results in Pinecone, which is supported by LangChain (Docs, API reference). For help getting
started with Pinecone or other vector databases, we have a cookbook to help you get started.
You can check the LangChain documentation to see what other vectorstores and databases are
available.
For this example we'll use the transcripts of the Stuff You Should Know podcast, which was
provided thanks to OSF DOI 10.17605/OSF.IO/VM9NT
import wget
# Download the file (it is ~541 MB so this will take some time)
wget.download(content_url)
# Load podcasts
with zipfile.ZipFile("sysk_podcast_transcripts_embedded.json.zip","r") as zip_ref:
zip_ref.extractall("./data")
f = open('./data/sysk_podcast_transcripts_embedded.json')
processed_podcasts = json.load(f)
id filename title
# Configuring the embeddings to be used by our retriever to be OpenAI Embeddings, matching our embedd
embeddings = OpenAIEmbeddings()
# Loads a docsearch object from an existing Pinecone index so we can retrieve from it
docsearch = Pinecone.from_existing_index(index_name,embeddings,text_key='text_chunk')
retriever = docsearch.as_retriever()
# Print out the title and content for the most relevant retrieved documents
print("\n".join(['Title: ' + x.metadata['title'].strip() + '\n\n' + x.page_content + '\n\n' for x in
Title: sysk_with_transcripts_Can you live without a bank account.json; And if you had a life,
Title: sysk_with_transcripts_Can you live without a bank account.json; And it doesn't have to
Title: sysk_with_transcripts_Can you live without a bank account.json; Maybe at the time, I mi
LLM Agent with Tools
Extend our list of tools by creating a RetrievalQA chain leveraging our Pinecone knowledge
base.
retrieval_llm = OpenAI(temperature=0)
expanded_tools = [
Tool(
name = "Search",
func=search.run,
description="useful for when you need to answer questions about current events"
),
Tool(
name = 'Knowledge Base',
func=podcast_retriever.run,
description="Useful for general questions about how to do things and for details on interesti
)
]
multi_tool_executor.run("Hi, I'd like to know how you can live without a bank account")
Observation:[33;1m[1;3m Not having a bank account can be dangerous, as the cash has to be sto
Action: Knowledge Base
Action Input: "Resources for alternative financial services or opening a bank account"[0m
Observation:[33;1m[1;3m There are a few resources available for alternative financial service
Final Answer: While it is possible to live without a bank account by using alternative financia
"While it is possible to live without a bank account by using alternative financial services, i
multi_tool_executor.run('Can you tell me some interesting facts about whether zoos are good or bad fo
Observation:[33;1m[1;3m The arguments for zoos include that they have gotten a lot better in
Action: Search
Action Input: "What are some examples of successful zoo conservation projects?"[0m
Observation:[36;1m[1;3mThere are dedicated species survival programs which have helped specie
Final Answer: Zoos can have both positive and negative effects on animals, but they can play a
[1m> Finished chain.[0m
"Zoos can have both positive and negative effects on animals, but they can play a role in conse
You now have a template to deploy conversational agents with tools. If you want to extend this
with a Custom Agent to add your own retry behaviour or treatment of input/output variables,
then follow this article.
Atlas Vector Search is a fully managed service that simplifies the process of effectively indexing
high-dimensional vector data within MongoDB and being able to perform fast vector similarity
searches. With Atlas Vector Search, you can use MongoDB as a standalone vector database for a
new project or augment your existing MongoDB collections with vector search functionality.
With Atlas Vector Search, you can use the powerful capabilities of vector search in any major
public cloud (AWS, Azure, GCP) and achieve massive scalability and data security out of the box
while being enterprise-ready with provisions like FedRamp, SoC2 compliance.
Documentation - link
Cookbook About API Docs Contribute
This notebook guides you step by step on using Hologres as a vector database for OpenAI
embeddings.
4. Using Hologres to perform the nearest neighbour search in the created collection.
5. Provide large language models with the search results as context in prompt engineering
What is Hologres
Hologres is a unified real-time data warehousing service developed by Alibaba Cloud. You can
use Hologres to write, update, process, and analyze large amounts of data in real time. Hologres
supports standard SQL syntax, is compatible with PostgreSQL, and supports most PostgreSQL
functions. Hologres supports online analytical processing (OLAP) and ad hoc analysis for up to
petabytes of data, and provides high-concurrency and low-latency online data services.
Hologres supports fine-grained isolation of multiple workloads and enterprise-level security
capabilities. Hologres is deeply integrated with MaxCompute, Realtime Compute for Apache
Flink, and DataWorks, and provides full-stack online and offline data warehousing solutions for
enterprises.
Deployment options
Click here to fast deploy Hologres data warehouse.
Prerequisites
2. The 'psycopg2-binary' library to interact with the vector database. Any other postgresql
client library is ok.
We might validate if the server was launched successfully by running a simple curl command:
Install requirements
This notebook obviously requires the openai and psycopg2-binary packages, but there are
also some other additional libraries we will use. The following command installs them all:
The OpenAI API key is used for vectorization of the documents and queries.
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY .
# Test that your OpenAI API key is correctly set as an environment variable
# Note. if you run this notebook locally, you will need to reload your terminal and the notebook for
import os
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
OPENAI_API_KEY is ready
Connect to Hologres
First add it to your environment variables. or you can just change the "psycopg2.connect"
parameters below
Connecting to a running instance of Hologres server is easy with the official Python library:
import os
import psycopg2
# Note. alternatively you can set a temporary env variable like this:
# os.environ["PGHOST"] = "your_host"
# os.environ["PGPORT"] "5432"),
# os.environ["PGDATABASE"] "postgres"),
# os.environ["PGUSER"] "user"),
# os.environ["PGPASSWORD"] "password"),
connection = psycopg2.connect(
host=os.environ.get("PGHOST", "localhost"),
port=os.environ.get("PGPORT", "5432"),
database=os.environ.get("PGDATABASE", "postgres"),
user=os.environ.get("PGUSER", "user"),
password=os.environ.get("PGPASSWORD", "password")
)
connection.set_session(autocommit=True)
Connection successful!
import wget
embeddings_url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
import zipfile
import os
import re
import tempfile
current_directory = os.getcwd()
zip_file_path = os.path.join(current_directory, "vector_database_wikipedia_articles_embedded.zip")
output_directory = os.path.join(current_directory, "../../data")
if os.path.exists(file_path):
print(f"The file {file_name} exists in the data directory.")
else:
print(f"The file {file_name} does not exist in the data directory.")
Load data
In this section we are going to load the data prepared previous to this session, so you don't
have to recompute the embeddings of Wikipedia articles with your own credits.
!unzip -n vector_database_wikipedia_articles_embedded.zip
!ls -lh vector_database_wikipedia_articles_embedded.csv
Archive: vector_database_wikipedia_articles_embedded.zip
-rw-r--r--@ 1 geng staff 1.7G Jan 31 01:19 vector_database_wikipedia_articles_embedded.csv
title_vector_length = len(json.loads(data['title_vector'].iloc[0]))
content_vector_length = len(json.loads(data['content_vector'].iloc[0]))
print(title_vector_length, content_vector_length)
1536 1536
Hologres stores data in tables where each object is described by at least one vector. Our table
will be called articles and each object will be described by both title and content vectors.
We will start with creating a table and create proxima indexes on both title and content, and
then we will fill it with our precomputed embeddings.
COMMIT;
'''
Upload data
Now let's upload the data to the Hologres cloud instance using COPY statement. This might
take 5-10 minutes according to the network bandwidth.
import io
The proxima index will be built in the background. We can do searching during this period but
the query will be slow without the vector index. Use this command to wait for finish building the
index.
cursor.execute('vacuum articles;')
# Check the collection size to make sure all the points have been stored
count_sql = "select count(*) from articles;"
cursor.execute(count_sql)
result = cursor.fetchone()
print(f"Count:{result[0]}")
Count:25000
Search data
Once the data is uploaded we will start querying the collection for the closest vectors. We may
provide an additional parameter vector_name to switch from title to content based search.
Since the precomputed embeddings were created with text-embedding-3-small OpenAI model
we also have to use it during search.
import openai
def query_knn(query, table_name, vector_name="title_vector", top_k=20):
return results
When you call the OpenAI API repeatedly, you may encounter error messages that say 429:
'Too Many Requests' or RateLimitError . These error messages come from exceeding the API's
rate limits.
This guide shares tips for avoiding and handling rate limit errors.
To see an example script for throttling parallel requests to avoid rate limit errors, see
api_request_parallel_processor.py.
Rate limits are a common practice for APIs, and they're put in place for a few different reasons.
First, they help protect against abuse or misuse of the API. For example, a malicious actor
could flood the API with requests in an attempt to overload it or cause disruptions in
service. By setting rate limits, OpenAI can prevent this kind of activity.
Second, rate limits help ensure that everyone has fair access to the API. If one person or
organization makes an excessive number of requests, it could bog down the API for
everyone else. By throttling the number of requests that a single user can make, OpenAI
ensures that everyone has an opportunity to use the API without experiencing slowdowns.
Lastly, rate limits can help OpenAI manage the aggregate load on its infrastructure. If
requests to the API increase dramatically, it could tax the servers and cause performance
issues. By setting rate limits, OpenAI can help maintain a smooth and consistent experience
for all users.
Although hitting rate limits can be frustrating, rate limits exist to protect the reliable operation
of the API for its users.
Your rate limit and spending limit (quota) are automatically adjusted based on a number of
factors. As your usage of the OpenAI API goes up and you successfully pay the bill, we
automatically increase your usage tier. You can find specific information regarding rate limits
using the resources below.
Help Center: How can I solve 429: 'Too Many Requests' errors?
If you'd like your organization's rate limit increased, please fill out the following form:
import openai
import os
A rate limit error will occur when API requests are sent too quickly. If using the OpenAI Python
library, they will look something like:
RateLimitError: Rate limit reached for default-codex in organization org-{id} on requests per min.
One easy way to avoid rate limit errors is to automatically retry requests with a random
exponential backoff. Retrying with exponential backoff means performing a short sleep when a
rate limit error is hit, then retrying the unsuccessful request. If the request is still unsuccessful,
the sleep length is increased and the process is repeated. This continues until the request is
successful or until a maximum number of retries is reached.
Automatic retries means you can recover from rate limit errors without crashes or missing
data
Exponential backoff means that your first retries can be tried quickly, while still benefiting
from longer delays if your first few retries fail
Adding random jitter to the delay helps retries from all hitting at the same time
Note that unsuccessful requests contribute to your per-minute limit, so continuously resending
a request won’t work.
To add exponential backoff to your requests, you can use the tenacity.retry decorator. The
following example uses the tenacity.wait_random_exponential function to add random
exponential backoff to a request.
Note that the Tenacity library is a third-party tool, and OpenAI makes no guarantees about its
reliability or security.
ChatCompletion(id='chatcmpl-8PAu6anX2JxQdYmJRzps38R8u0ZBC', choices=[Choice(finish_reason='stop
Another library that provides function decorators for backoff and retry is backoff.
Like Tenacity, the backoff library is a third-party tool, and OpenAI makes no guarantees about
its reliability or security.
@backoff.on_exception(backoff.expo, openai.RateLimitError)
def completions_with_backoff(**kwargs):
return client.chat.completions.create(**kwargs)
If you don't want to use third-party libraries, you can implement your own backoff logic.
# imports
import random
import time
return wrapper
@retry_with_exponential_backoff
def completions_with_backoff(**kwargs):
return client.chat.completions.create(**kwargs)
ChatCompletion(id='chatcmpl-8PAxGvV3GbLpnOoKSvJ00XCUdOglM', choices=[Choice(finish_reason='stop
If you're processing real-time requests from users, backoff and retry is a great strategy to
minimize latency while avoiding rate limit errors.
However, if you're processing large volumes of batch data, where throughput matters more
than latency, there are a few other things you can do in addition to backoff and retry.
If you are constantly hitting the rate limit, then backing off, then hitting the rate limit again,
then backing off again, it's possible that a good fraction of your request budget will be 'wasted'
on requests that need to be retried. This limits your processing throughput, given a fixed rate
limit.
Here, one potential solution is to calculate your rate limit and add a delay equal to its reciprocal
(e.g., if your rate limit 20 requests per minute, add a delay of 3–6 seconds to each request). This
can help you operate near the rate limit ceiling without hitting it and incurring wasted requests.
# imports
import time
delayed_completion(
delay_in_seconds=delay,
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Once upon a time,"}]
)
ChatCompletion(id='chatcmpl-8PAyCR1axKsomV0e349XiCN1Z81pH', choices=[Choice(finish_reason='stop
Batching requests
The OpenAI API has separate limits for requests per minute and tokens per minute.
If you're hitting the limit on requests per minute, but have headroom on tokens per minute, you
can increase your throughput by batching multiple tasks into each request. This will allow you
to process more tokens per minute, especially with the smaller models.
Sending in a batch of prompts works exactly the same as a normal API call, except that pass in a
list of strings to prompt parameter instead of a single string.
Warning: the response object may not return completions in the order of the prompts, so
always remember to match responses back to prompts using the index field.
num_stories = 10
content = "Once upon a time,"
# print story
print(content + response.choices[0].message.content)
Once upon a time,in a small village nestled between rolling green hills, there lived a young gi
Once upon a time,in a small village nestled in the heart of a lush forest, lived a young girl n
Once upon a time,in a faraway kingdom, there lived a young princess named Aurora. She was known
Once upon a time,in a faraway kingdom called Enchantia, there lived a young girl named Ella. El
Once upon a time,in a small village nestled among the rolling hills, lived a young woman named
Once upon a time,in a small village nestled between rolling hills, there lived a young girl nam
Once upon a time,in a faraway kingdom, there lived a wise and just king named Arthur. King Arth
Once upon a time,in a small village nestled among towering mountains, lived a young girl named
Once upon a time,in a small village nestled in the heart of a lush forest, there lived a young
Once upon a time,in a far-off kingdom, there lived a kind and beloved queen named Isabella. She
num_stories = 10
prompts = ["Once upon a time,"] * num_stories
# print stories
for story in stories:
print(story)
Once upon a time, I lived in hope. I convinced myself I knew best, because, naive as it might s
Once upon a time, Thierry Henry was invited to have a type of frosty exchange with English fans
Once upon a time, and a long time ago as well, PV was passively cooled because coils cooled by
Once upon a time, there was a land called Texas. It was about the size of Wisconsin. It contain
Once upon a time, there was an old carpenter who had three sons. The locksmith never learned to
Once upon a time, there was a small farming town called Moonridge Village, far West across the
Once upon a time, California’s shorelines, lakes, and valleys were host to expanses of untamed
Once upon a time, she said. It started with a simple question: Why don’t we know any stories?
Once upon a time, when I was a young woman, there was a movie named Wuthering Heights. Stand by
Once upon a time, a very long time I mean, in the year 1713, died a beautiful Duchess called th
Streams requests from file, to avoid running out of memory for giant jobs
Throttles both request and token usage, to stay under rate limits
Complex tasks, such as writing unit tests, can benefit from multi-step prompts. In contrast to a
single prompt, a multi-step prompt generates text from GPT-3 and then feeds that text back
into subsequent prompts. This can help in cases where you want GPT-3 to explain its reasoning
before answering, or brainstorm a plan before executing it.
In this notebook, we use a 3-step prompt to write unit tests in Python using the following steps:
1. Given a Python function, we first prompt GPT-3 to explain what the function is doing.
2. Second, we prompt GPT-3 to plan a set of unit tests for the function.
If the plan is too short, we ask GPT-3 to elaborate with more ideas for unit tests.
The code example illustrates a few optional embellishments on the chained, multi-step prompt:
Conditional branching (e.g., only asking for elaboration if the first plan is too short)
Different models for different steps (e.g., gpt-3.5-turbo-instruct for the text planning
steps and gpt-4 for the code writing step)
A check that re-runs the function if the output is unsatisfactory (e.g., if the output code
cannot be parsed by Python's ast module)
Streaming output so that you can start reading the output before it's fully generated (useful
for long, multi-step outputs)
The full 3-step prompt looks like this (using as an example pytest for the unit test framework
and is_palindrome as the function):
# How to write great unit tests with pytest
In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of unit t
```python
def is_palindrome(s):
return s == s[::-1]
```
Before writing any unit tests, let's review what each element of the function is doing exactly and
- First,{GENERATED IN STEP 1}
import ast # used for detecting whether generated Python code is valid
import openai
In this advanced tutorial for experts, we'll use Python 3.9 and `{unit_test_package}` to write a suit
```python
{function_to_test}
```
Before writing any unit tests, let's review what each element of the function is doing exactly and wh
- First,"""
if print_text:
text_color_prefix = "\033[30m" # black; if you read against a dark background \033[97m is wh
print(text_color_prefix + prompt_to_explain_the_function, end="") # end='' prevents a newlin
# send the prompt to the API, using \n\n as a stop sequence to stop at the end of the bullet list
explanation_response = openai.Completion.create(
model=text_model,
prompt=prompt_to_explain_the_function,
stop=["\n\n", "\n\t\n", "\n \n"],
max_tokens=max_tokens,
temperature=temperature,
stream=True,
)
explanation_completion = ""
if print_text:
completion_color_prefix = "\033[92m" # green
print(completion_color_prefix, end="")
for event in explanation_response:
event_text = event["choices"][0]["text"]
explanation_completion += event_text
if print_text:
print(event_text, end="")
# create a markdown-formatted prompt that asks GPT-3 to complete a plan for writing unit tests, f
prompt_to_explain_a_plan = f"""
`{unit_test_package}` has many convenient features that make it easy to write and maintain unit tests
For this particular function, we'll want our unit tests to handle the following diverse scenarios (an
-"""
if print_text:
print(text_color_prefix + prompt_to_explain_a_plan, end="")
# send the prompt to the API, using \n\n as a stop sequence to stop at the end of the bullet list
plan_response = openai.Completion.create(
model=text_model,
prompt=full_plan_prompt,
stop=["\n\n", "\n\t\n", "\n \n"],
max_tokens=max_tokens,
temperature=temperature,
stream=True,
)
plan_completion = ""
if print_text:
print(completion_color_prefix, end="")
for event in plan_response:
event_text = event["choices"][0]["text"]
plan_completion += event_text
if print_text:
print(event_text, end="")
In addition to the scenarios above, we'll also want to make sure we don't forget to test rare or unex
-"""
if print_text:
print(text_color_prefix + prompt_to_elaborate_on_the_plan, end="")
# send the prompt to the API, using \n\n as a stop sequence to stop at the end of the bullet
elaboration_response = openai.Completion.create(
model=text_model,
prompt=full_elaboration_prompt,
stop=["\n\n", "\n\t\n", "\n \n"],
max_tokens=max_tokens,
temperature=temperature,
stream=True,
)
elaboration_completion = ""
if print_text:
print(completion_color_prefix, end="")
for event in elaboration_response:
event_text = event["choices"][0]["text"]
elaboration_completion += event_text
if print_text:
print(event_text, end="")
Before going into the individual tests, let's first look at the complete suite of unit tests as a coh
```python
import {unit_test_package} # used for our unit tests
{function_to_test}
#{starter_comment}"""
if print_text:
print(text_color_prefix + prompt_to_generate_the_unit_test, end="")
# send the prompt to the API, using ``` as a stop sequence to stop at the end of the code block
unit_test_response = openai.Completion.create(
model=code_model,
prompt=full_unit_test_prompt,
stop="```",
max_tokens=max_tokens,
temperature=temperature,
stream=True
)
unit_test_completion = ""
if print_text:
print(completion_color_prefix, end="")
for event in unit_test_response:
event_text = event["choices"][0]["text"]
unit_test_completion += event_text
if print_text:
print(event_text, end="")
unit_test_from_function(example_function, print_text=True)
In this advanced tutorial for experts, we'll use Python 3.9 and `pytest` to write a suite of un
```python
def is_palindrome(s):
return s == s[::-1]
```
Before writing any unit tests, let's review what each element of the function is doing exactly
- First,[92m we have a function definition. This is where we give the function a name, `is_pal
- Next, we have a return statement. This is where we specify the value that the function return
- Finally, we have a function call. This is where we actually call the function with a specific
`pytest` has many convenient features that make it easy to write and maintain unit tests. We'll
For this particular function, we'll want our unit tests to handle the following diverse scenari
-[92m The input is a palindrome
- `"racecar"`
- `"madam"`
- `"anna"`
- The input is not a palindrome
- `"python"`
Cookbook About API Docs Contribute
TLDR: Developers can now specify seed parameter in the Chat Completion request to receive
(mostly) consistent outputs. To help you keep track of these changes, we expose the
system_fingerprint field. If this value is different, you may see different outputs due to
changes we've made on our systems. Please note that this feature is in beta and only currently
supported for gpt-4-1106-preview and gpt-3.5-turbo-1106 .
Context
Reproducibility has always been a big request from user communities when using our APIs. For
instance, when granted the capability of getting reproducible numerical result, users can unlock
quite a bit of use cases that’s sensitive to numerical changes.
The Chat Completions and Completions APIs are non-deterministic by default (which means
model outputs may differ from request to request), but now offer some control towards
deterministic outputs using a few model level controls.
This can unlock consistent completions which enables full control on the model behaviors for
anything built on top of the APIs, and quite useful for reproducing results and testing so you
know get peace of mind from knowing exactly what you’d get.
Set all other parameters (prompt, temperature, top_p, etc.) to the same values across
requests.
In the response, check the system_fingerprint field. The system fingerprint is an identifier
for the current combination of model weights, infrastructure, and other configuration
options used by OpenAI servers to generate the completion. It changes whenever you
change request parameters, or OpenAI updates numerical configuration of the
infrastructure serving our models (which may happen a few times a year).
If the seed , request parameters, and system_fingerprint all match across your requests, then
model outputs will mostly be identical. There is a small chance that responses differ even when
request parameters and system_fingerprint match, due to the inherent non-determinism of
our models.
seed
If specified, our system will make a best effort to sample deterministically, such that repeated
requests with the same seed and parameters should return the same result. Determinism is not
guaranteed, and you should refer to the system_fingerprint response parameter to monitor
changes in the backend.
system_fingerprint
This fingerprint represents the backend configuration that the model runs with. It can be used in
conjunction with the seed request parameter to understand when backend changes have been
made that might impact determinism.This is the indicator on whether users should expect
"almost always the same result".
In this example, we will demonstrate how to generate a short excerpt using a fixed seed. This
can be particularly useful in scenarios where you need to generate consistent results for testing,
debugging, or for applications that require consistent outputs.
Python SDK
!pip install --upgrade openai # Switch to the latest version of OpenAI (1.3.3 at time of writing)
import openai
import asyncio
from IPython.display import display, HTML
GPT_MODEL = "gpt-3.5-turbo-1106"
response = openai.chat.completions.create(
model=GPT_MODEL,
messages=messages,
seed=seed,
max_tokens=200,
temperature=temperature,
)
response_content = response.choices[0].message.content
system_fingerprint = response.system_fingerprint
prompt_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.total_tokens - response.usage.prompt_tokens
table = f"""
<table>
<tr><th>Response</th><td>{response_content}</td></tr>
<tr><th>System Fingerprint</th><td>{system_fingerprint}</td></tr>
<tr><th>Number of prompt tokens</th><td>{prompt_tokens}</td></tr>
<tr><th>Number of completion tokens</th><td>{completion_tokens}</td></tr>
</table>
"""
display(HTML(table))
return response_content
except Exception as e:
print(f"An error occurred: {e}")
return None
def calculate_average_distance(responses):
"""
This function calculates the average distance between the embeddings of the responses.
The distance between embeddings is a measure of how similar the responses are.
"""
# Calculate embeddings for each response
response_embeddings = [get_embedding(response) for response in responses]
First, let's try generating few different versions of a short excerpt about "a journey to Mars"
without the seed parameter. This is the default behavior:
responses = []
Output 1
----------
Response "NASA's Mars mission reaches critical stage as spacecraft successfully enters orbit around
the red planet. The historic journey, which began over a year ago, has captured the
world's attention as scientists and astronauts prepare to land on Mars for the first time.
The mission is expected to provide valuable insights into the planet's geology,
System fp_772e8125bb
atmosphere, and potential for sustaining human life in the future."
Fingerprint
Number of 29
prompt tokens
Number of 76
completion
tokens
Output 2
----------
Now, let's try to tun the same code with a constant seed of 123 and temperature of 0 and
compare the responses and system_fingerprint .
SEED = 123
responses = []
average_distance = calculate_average_distance(responses)
print(f"The average distance between responses is: {average_distance}")
Output 1
----------
Number of 29
prompt tokens
Number of 113
completion
tokens
Output 2
----------
As we can observe, the seed parameter allows us to generate much more consistent results.
Conclusion
We demonstrated how to use a fixed integer seed to generate consistent outputs from our
model. This is particularly useful in scenarios where reproducibility is important. However, it's
important to note that while the seed ensures consistency, it does not guarantee the quality of
the output. Note that when you want to use reproducible outputs, you need to set the seed to
the same integer across Chat Completions calls. You should also match any other parameters
like temperature , max_tokens etc. Further extension of reproducible outputs could be to use
consistent seed when benchmarking/evaluating the performance of different prompts or
models, to ensure that each version is evaluated under the same conditions, making the
comparisons fair and the results reliable.
Cookbook About API Docs Contribute
We will use t-SNE to reduce the dimensionality of the embeddings from 1536 to 2. Once the
embeddings are reduced to two dimensions, we can plot them in a 2D scatter plot. The dataset
is created in the Get_embeddings_from_dataset Notebook.
1. Reduce dimensionality
We reduce the dimensionality to 2 dimensions using t-SNE decomposition.
import pandas as pd
from sklearn.manifold import TSNE
import numpy as np
from ast import literal_eval
(1000, 2)
We colour each review by its star rating, ranging from red to green.
colormap = matplotlib.colors.ListedColormap(colors)
plt.scatter(x, y, c=color_indices, cmap=colormap, alpha=0.3)
for score in [0,1,2,3,4]:
avg_x = np.array(x)[df.Score-1==score].mean()
avg_y = np.array(y)[df.Score-1==score].mean()
color = colors[score]
plt.scatter(avg_x, avg_y, marker='x', color=color, s=100)
This notebook shows how to implement a question answering system with LangChain, Deep
Lake as a vector store and OpenAI embeddings. We will take the following steps to achieve this:
5. Done!
You can also follow other tutorials such as question answering over any type of data (PDFs, json,
csv, text): chatting with any data stored in Deep Lake, code understanding, or question
answering over PDFs, or recommending songs.
Install requirements
Authentication
os.environ['OPENAI_API_KEY'] = getpass.getpass()
··········
We will use a 20000 sample subset of the cohere-wikipedia-22 dataset for this example.
import deeplake
ds = deeplake.load("hub://activeloop/cohere-wikipedia-22-sample")
ds.summary()
['The 24-hour clock is a way of telling the time in which the day runs from midnight to midnigh
'A time in the 24-hour clock is written in the form hours:minutes (for example, 01:23), or hou
'However, the US military prefers not to say 24:00 - they do not like to have two names for th
Let's define a dataset_path , this is where your Deep Lake vector store will house the text
embeddings.
dataset_path = 'wikipedia-embeddings-deeplake'
embedding = OpenAIEmbeddings(model="text-embedding-3-small")
db = DeepLake(dataset_path, embedding=embedding, overwrite=True)
... and populate it with samples, one batch at a time, using the add_texts method.
batch_size = 100
batch = ds[i:i_end]
id_batch = batch.ids.data()["value"]
text_batch = batch.text.data()["value"]
meta_batch = batch.metadata.data()["value"]
db.add_texts(text_batch, metadatas=meta_batch, ids=id_batch)
The underlying Deep Lake dataset object is accessible through db.vectorstore.dataset , and
the data structure can be summarized using db.vectorstore.summary() , which shows 4 tensors
with 10 samples:
db.vectorstore.summary()
Let's try running a prompt and check the output. Internally, this API performs an embedding
search to find the most relevant data to feed into the LLM context.
'The military prefers not to say 24:00 because they do not like to have two names for the same
Et voila!
Cookbook About API Docs Contribute
We will fine-tune a babbage-002 classifier (replacement for the ada models) to distinguish
between the two sports: Baseball and Hockey.
Data exploration
The newsgroup dataset can be loaded using sklearn. First we will look at the data itself:
print(sports_dataset['data'][0])
--
Doug Bank Private Systems Division
[email protected] Motorola Communications Sector
[email protected] Schaumburg, Illinois
[email protected] 708-576-8207
sports_dataset.target_names[sports_dataset['target'][0]]
'rec.sport.baseball'
One sample from the baseball category can be seen above. It is an email to a mailing list. We
can observe that we have 1197 examples in total, which are evenly split between the two sports.
Data Preparation
We transform the dataset into a pandas dataframe, with a column for prompt and completion.
The prompt contains the email from the mailing list, and the completion is a name of the sport,
either hockey or baseball. For demonstration purposes only and speed of fine-tuning we take
only 300 examples. In a real use case the more examples the better the performance.
import pandas as pd
Both baseball and hockey are single tokens. We save the dataset as a jsonl file.
We can now use a data preparation tool which will suggest a few improvements to our dataset
before fine-tuning. Before launching the tool we update the openai library to ensure we're using
the latest data preparation tool. We additionally specify -q which auto-accepts all suggestions.
Analyzing...
After you’ve fine-tuned a model, remember that your prompt has to end with the indicator string
Once your model starts training, it'll approximately take 30.8 minutes to train a `curie` model
The tool helpfully suggests a few improvements to the dataset and splits the dataset into
training and validation set.
A suffix between a prompt and a completion is necessary to tell the model that the input text
has stopped, and that it now needs to predict the class. Since we use the same separator in each
example, the model is able to learn that it is meant to predict either baseball or hockey
following the separator. A whitespace prefix in completions is useful, as most word tokens are
tokenized with a space prefix. The tool also recognized that this is likely a classification task, so
it suggested to split the dataset into training and validation datasets. This will allow us to easily
measure expected performance on new data.
Fine-tuning
The tool suggests we run the following command to train the dataset. Since this is a
classification task, we would like to know what the generalization performance on the provided
validation set is for our classification use case.
We can simply copy the suggested command from the CLI tool. We specifically add -m ada to
fine-tune a cheaper and faster ada model, which is usually comperable in performance to slower
and more expensive models on classification use cases.
print(fine_tuning_job)
fine_tune_results = client.fine_tuning.jobs.retrieve(fine_tuning_job.id)
print(fine_tune_results.finished_at)
1704414393
fine_tune_results = client.fine_tuning.jobs.retrieve(fine_tuning_job.id).result_files
result_file = client.files.retrieve(fine_tune_results[0])
content = client.files.content(result_file.id)
# save content to file
with open("result.csv", "wb") as f:
f.write(content.text.encode("utf-8"))
results = pd.read_csv('result.csv')
results[results['train_accuracy'].notnull()].tail(1)
The accuracy reaches 99.6%. On the plot below we can see how accuracy on the validation set
increases during the training run.
results[results['train_accuracy'].notnull()]['train_accuracy'].plot()
Using the model
prompt completion
We need to use the same separator following the prompt which we used during fine-tuning. In
this case it is \n\n###\n\n . Since we're concerned with classification, we want the temperature
to be as low as possible, and we only require one token completion to determine the prediction
of the model.
ft_model = fine_tune_results.fine_tuned_model
' hockey'
To get the log probabilities, we can specify logprobs parameter on the completion request
We can see that the model predicts hockey as a lot more likely than baseball, which is the
correct prediction. By requesting log_probs, we can see the prediction (log) probability for each
class.
Generalization
Interestingly, our fine-tuned classifier is quite versatile. Despite being trained on emails to
different mailing lists, it also successfully predicts tweets.
' hockey'
sample_baseball_tweet="""BREAKING: The Tampa Bay Rays are finalizing a deal to acquire slugger Nelson
res = client.completions.create(model=ft_model, prompt=sample_baseball_tweet + '\n\n###\n\n', max_tok
res.choices[0].text
Cookbook About API Docs Contribute
This notebook guides you step by step on using Tair as a vector database for OpenAI
embeddings.
4. Using Tair to perform the nearest neighbour search in the created collection.
What is Tair
Tair is a cloud native in-memory database service that is developed by Alibaba Cloud. Tair is
compatible with open source Redis and provides a variety of data models and enterprise-class
capabilities to support your real-time online scenarios. Tair also introduces persistent memory-
optimized instances that are based on the new non-volatile memory (NVM) storage medium.
These instances can reduce costs by 30%, ensure data persistence, and provide almost the same
performance as in-memory databases. Tair has been widely used in areas such as government
affairs, finance, manufacturing, healthcare, and pan-Internet to meet their high-speed query and
computing requirements.
Tairvector is an in-house data structure that provides high-performance real-time storage and
retrieval of vectors. TairVector provides two indexing algorithms: Hierarchical Navigable Small
World (HNSW) and Flat Search. Additionally, TairVector supports multiple distance functions,
such as Euclidean distance, inner product, and Jaccard distance. Compared with traditional
vector retrieval services, TairVector has the following advantages:
Stores all data in memory and supports real-time index updates to reduce latency of read
and write operations.
Deployment options
Using Tair Cloud Vector Database. Click here to fast deploy it.
Prerequisites
Install requirements
This notebook obviously requires the openai and tair packages, but there are also some
other additional libraries we will use. The following command installs them all:
The OpenAI API key is used for vectorization of the documents and queries.
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
import getpass
import openai
Connect to Tair
Connecting to a running instance of Tair server is easy with the official Python library.
url = TAIR_URL
client = TairClient.from_url(url)
client.ping()
True
import wget
embeddings_url = "https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
'vector_database_wikipedia_articles_embedded (1).zip'
import zipfile
import os
import re
import tempfile
current_directory = os.getcwd()
zip_file_path = os.path.join(current_directory, "vector_database_wikipedia_articles_embedded.zip")
output_directory = os.path.join(current_directory, "../../data")
with zipfile.ZipFile(zip_file_path, "r") as zip_ref:
zip_ref.extractall(output_directory)
if os.path.exists(file_path):
print(f"The file {file_name} exists in the data directory.")
else:
print(f"The file {file_name} does not exist in the data directory.")
Create Index
Tair stores data in indexes where each object is described by one key. Each key contains a vector
and multiple attribute_keys.
We will start with creating two indexes, one for title_vector and one for content_vector, and
then we will fill it with our precomputed embeddings.
# Create two indexes, one for title_vector and one for content_vector, skip if already exists
index_names = [index + "_title_vector", index+"_content_vector"]
for index_name in index_names:
index_connection = client.tvs_get_index(index_name)
if index_connection is not None:
print("Index already exists")
else:
client.tvs_create_index(name=index_name, dim=embedding_dim, distance_type=distance_type,
index_type=index_type, data_type=data_type)
In this section we are going to load the data prepared previous to this session, so you don't
have to recompute the embeddings of Wikipedia articles with your own credits.
import pandas as pd
from ast import literal_eval
# Path to your local CSV file
csv_file_path = '../../data/vector_database_wikipedia_articles_embedded.csv'
article_df = pd.read_csv(csv_file_path)
# Check the data count to make sure all the points have been stored
for index_name in index_names:
stats = client.tvs_get_index(index_name)
count = int(stats["current_record_count"]) - int(stats["delete_record_count"])
print(f"Count in {index_name}:{count}")
Count in openai_test_title_vector:25000
Count in openai_test_content_vector:25000
Search data
Once the data is put into Tair we will start querying the collection for the closest vectors. We
may provide an additional parameter vector_name to switch from title to content based search.
Since the precomputed embeddings were created with text-embedding-3-small OpenAI
model, we also have to use it during search.
def query_tair(client, query, vector_name="title_vector", top_k=5):
return query_result
import openai
import numpy as np
In this Jupyter Notebook, we construct a Kangas DataGrid containing the data and projections
of the embeddings into 2 dimensions.
What is Kangas?
Kangas as an open source, mixed-media, dataframe-like tool for data scientists. It was
developed by Comet, a company designed to help reduce the friction of moving models into
production.
1. Setup
To get started, we pip install kangas, and import it.
import kangas as kg
We create a Kangas Datagrid with the original data and the embeddings. The data is composed
of a rows of reviews, and the embeddings are composed of 1536 floating-point values. In this
example, we get the data directly from github, in case you aren't running this notebook inside
OpenAI's repo.
We use Kangas to read the CSV file into a DataGrid for further processing.
data = kg.read_csv("https://raw.githubusercontent.com/openai/openai-cookbook/main/examples/data/fine_
data.info()
data
row- Column
ProductId UserId Score Summary Text combined n_tokens embedding
id 1
...
dg = kg.DataGrid(
name="openai_embeddings",
columns=data.get_columns(),
converters={"Score": str},
)
for row in data:
embedding = ast.literal_eval(row[8])
row[8] = kg.Embedding(
embedding,
name=str(row[3]),
text="%s - %.10s" % (row[3], row[4]),
projection="umap",
)
dg.append(row)
The new DataGrid now has an Embedding column with proper datatype.
dg.info()
dg.save()
3. Render 2D Projections
To render the data directly in the notebook, simply show it. Note that each row contains an
embedding projection.
dg.show()
Group by "Score" to see rows of each group.
In this guide you will learn how to use the OpenAI Embedding API to generate language
embeddings, and then index those embeddings in the Pinecone vector database for fast and
scalable vector search.
This is a powerful and common combination for building semantic search, question-answering,
threat-detection, and other applications that rely on NLP and search over a large corpus of text
data.
Use the OpenAI Embedding API to generate vector embeddings of your documents (or any
text data).
Upload those vector embeddings into Pinecone, which can store and index millions/billions
of these vector embeddings, and search through them at ultra-low latencies.
Search
Pass your query text or document through the OpenAI Embedding API again.
Get back semantically similar documents, even if they don't share any keywords with the
query.
Let's get started...
Setup
We first need to setup our environment and retrieve API keys for OpenAI and Pinecone. Let's
start with our environment, we need HuggingFace Datasets for our data, and the OpenAI and
Pinecone clients:
Then we initialize our connection to OpenAI Embeddings and Pinecone vector DB. Sign up for
an API key over at OpenAI and Pinecone.
client = OpenAI(
api_key="OPENAI_API_KEY"
) # get API key from platform.openai.com
We can now create embeddings with the OpenAI Ada similarity model like so:
MODEL = "text-embedding-3-small"
res = client.embeddings.create(
input=[
"Sample document text goes here",
"there will be several phrases in each batch"
], model=MODEL
)
res
CreateEmbeddingResponse(data=[Embedding(embedding=[-0.0007019874756224453, 0.017813093960285187
vector 0: 1536
vector 1: 1536
1536
Initialize connection to Pinecone, you can get a free API key in the Pinecone dashboard.
pc = Pinecone(api_key="...")
import time
from pinecone import ServerlessSpec
index_name = 'semantic-search-openai'
# check if index already exists (if shouldn't if this is your first run)
if index_name not in pc.list_indexes().names():
# if does not exist, create index
pc.create_index(
index_name,
dimension=len(embeds[0]), # dimensionality of text-embed-3-small
metric='dotproduct',
spec=spec
)
# wait for index to be initialized
while not pc.describe_index(index_name).status['ready']:
time.sleep(1)
# connect to index
index = pc.Index(index_name)
time.sleep(1)
# view index stats
index.describe_index_stats()
{'dimension': 1536,
'index_fullness': 0.0,
'namespaces': {},
'total_vector_count': 0}
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://hugging
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or da
warnings.warn(
Dataset({
features: ['text', 'coarse_label', 'fine_label'],
num_rows: 1000
})
trec[0]
{'text': 'How did serfdom develop in and then leave Russia ?',
'coarse_label': 2,
'fine_label': 26}
Then we create a vector embedding for each phrase using OpenAI, and upsert the ID, vector
embedding, and original text for each phrase to Pinecone.
from tqdm.auto import tqdm
Querying
With our data indexed, we're now ready to move onto performing searches. This follows a
similar process to indexing. We start with a text query , that we would like to use to find similar
sentences. As before we encode this with OpenAI's text similarity Babbage model to create a
query vector xq . We then use xq to query the Pinecone index.
xq = client.embeddings.create(input=query, model=MODEL).data[0].embedding
Now query...
The response from Pinecone includes our original text in the metadata field, let's print out the
top_k most similar questions and their respective similarity scores.
Looks good, let's make it harder and replace "depression" with the incorrect term "recession".
query = "What was the cause of the major recession in the early 20th century?"
And again...
query = "Why was there a long-term economic downturn in the early 20th century?"
Looks great, our semantic search pipeline is clearly able to identify the meaning between each
of our queries and return the most semantically similar questions from the already indexed
questions.
pc.delete_index(index_name)
Cookbook About API Docs Contribute
This notebook takes you through examples of using a cross-encoder to re-rank search results.
This is a common use case with our customers, where you've implemented semantic search
using embeddings (produced using a bi-encoder) but the results are not as accurate as your
use case requires. A possible cause is that there is some business rule you can use to rerank the
documents such as how recent or how popular a document is.
However, often there are subtle domain-specific rules that help determine relevancy, and this is
where a cross-encoder can be useful. Cross-encoders are more accurate than bi-encoders but
they don't scale well, so using them to re-order a shortened list returned by semantic search is
the ideal use case.
Example
Consider a search task with D documents and Q queries.
The brute force approach of computing every pairwise relevance is expensive; its cost scales as
D * Q . This is known as cross-encoding.
Although embeddings-based search is faster, the quality can be worse. To get the best of both,
one common approach is to use embeddings (or another bi-encoder) to cheaply identify top
candidates, and then use GPT (or another cross-encoder) to expensively re-rank those top
candidates. The cost of this hybrid approach scales as (D + Q) * cost of embedding + (N * Q)
* cost of re-ranking , where N is the number of candidates re-ranked.
Walkthrough
To illustrate this approach we'll use text-davinci-003 with logprobs enabled to build a GPT-
powered cross-encoder. Our GPT models have strong general language understanding, which
when tuned with some few-shot examples can provide a simple and effective cross-encoding
option.
This notebook drew on this great article by Weaviate, and this excellent explanation of bi-
encoders vs. cross-encoders from Sentence Transformers.
import arxiv
from math import exp
import openai
import os
import pandas as pd
from tenacity import retry, wait_random_exponential, stop_after_attempt
import tiktoken
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
OPENAI_MODEL = "gpt-4"
Search
We'll use the arXiv search service for this example, but this step could be performed by any
search service you have. The key item to consider is over-fetching slightly to capture all the
potentially relevant documents, before re-sorting them.
result_list = []
result_dict.update({"title": result.title})
result_dict.update({"summary": result.summary})
result_list[0]
{'title': 'SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explaina
'summary': 'Models based on large-pretrained language models, such as S(entence)BERT,\nprovide
'article_url': 'http://arxiv.org/abs/2206.07023v2',
'pdf_url': 'http://arxiv.org/pdf/2206.07023v2'}
1: SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Sema
2: Are Classes Clusters?
3: Semantic Composition in Visually Grounded Language Models
4: Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
5: Learning Probabilistic Sentence Representations from Paraphrases
6: Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence
7: How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for
8: Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences
9: Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
10: Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
11: SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding
12: Learning Joint Representations of Videos and Sentences with Web Image Search
13: Character-based Neural Networks for Sentence Pair Modeling
14: Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
15: Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
16: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
17: In Search for Linear Relations in Sentence Embedding Spaces
18: Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledg
19: Efficient and Flexible Topic Modeling using Pretrained Embeddings and Bag of Sentences
20: Relational Sentence Embedding for Flexible Semantic Matching
Cross-encoder
We'll create a cross-encoder using the Completions endpoint - the key factors to consider here
are:
Make your examples domain-specific - the strength of cross-encoders comes when you
tailor them to your domain.
There is a trade-off between how many potential examples to re-rank vs. processing speed.
Consider batching and parallel processing cross-encoder requests to process them more
quickly.
Build a prompt to assess relevance and provide few-shot examples to tune it to your
domain.
Add a logit bias for the tokens for Yes and No to decrease the likelihood of any other
tokens occurring.
([3363], [1400])
prompt = '''
You are an Assistant responsible for helping detect whether the retrieved document is relevant to the
Query: What are some papers to learn about PPO reinforcement learning?
Document: """Proximal Policy Optimization and its Dynamic Version for Sequence Generation: In sequenc
Relevant: Yes
Query: {query}
Document: """{document}"""
Relevant:
'''
return (
query,
document,
response.choices[0].message.content,
response.choices[0].logprobs.token_logprobs[0],
)
# Set logprobs to 1 so our response will include the most probable token the model identified
response = openai.chat.completions.create(
model=OPENAI_MODEL,
prompt=prompt.format(query=query, document=content),
temperature=0,
logprobs=1,
logit_bias={3363: 1, 1400: 1},
max_tokens=1,
)
result = response.choices[0]
print(f"Result was {result.message.content}")
print(f"Logprobs was {result.logprobs.token_logprobs[0]}")
print("\nBelow is the full logprobs object\n\n")
print(result["logprobs"])
Result was Yes
Logprobs was -0.05869877
{
"tokens": [
"Yes"
],
"token_logprobs": [
-0.05869877
],
"top_logprobs": [
{
"Yes": -0.05869877
}
],
"text_offset": [
5764
]
}
output_list = []
for x in result_list:
content = x["title"] + ": " + x["summary"]
try:
output_list.append(document_relevance(query, document=content))
except Exception as e:
print(e)
output_list[:10]
output_df = pd.DataFrame(
output_list, columns=["query", "document", "prediction", "logprobs"]
).reset_index()
# Use exp() to convert logprobs into probability
output_df["probability"] = output_df["logprobs"].apply(exp)
# Reorder based on likelihood of being Yes
output_df["yes_probability"] = output_df.apply(
lambda x: x["probability"] * -1 + 1
if x["prediction"] == "No"
else x["probability"],
axis=1,
)
output_df.head()
'In Search for Linear Relations in Sentence Embedding Spaces: We present an introductory invest
Conclusion
We've shown how to create a tailored cross-encoder to rerank academic papers. This approach
will work best where there are domain-specific nuances that can be used to pick the most
relevant corpus for your users, and where some pre-filtering has taken place to limit the amount
of data the cross-encoder will need to process.
A few typical use cases we've seen are:
Returning a list of 100 most relevant stock reports, then re-ordering into a top 5 or 10
based on the detailed context of a particular set of customer portfolios
Running after a classic rules-based search that gets the top 100 or 1000 most relevant
results to prune it according to a specific user's context
Taking the few-shot approach, as we have here, can work well when the domain is general
enough that a small number of examples will cover most reranking cases. However, as the
differences between documents become more specific you may want to consider the Fine-
tuning endpoint to make a more elaborate cross-encoder with a wider variety of examples.
There is also a latency impact of using text-davinci-003 that you'll need to consider, with
even our few examples above taking a couple seconds each - again, the Fine-tuning endpoint
may help you here if you are able to get decent results from an ada or babbage fine-tuned
model.
We've used the Completions endpoint from OpenAI to build our cross-encoder, but this area is
well-served by the open-source community. Here is an example from HuggingFace, for
example.
We hope you find this useful for tuning your search use cases, and look forward to seeing what
you build.
Cookbook About API Docs Contribute
We can search through all our reviews semantically in a very efficient manner and at very low
cost, by embedding our search query, and then finding the most similar reviews. The dataset is
created in the Get_embeddings_from_dataset Notebook.
import pandas as pd
import numpy as np
from ast import literal_eval
datafile_path = "data/fine_food_reviews_with_embeddings_1k.csv"
df = pd.read_csv(datafile_path)
df["embedding"] = df.embedding.apply(literal_eval).apply(np.array)
Here we compare the cosine similarity of the embeddings of the query and the documents, and
show top_n best matches.
results = (
df.sort_values("similarity", ascending=False)
.head(n)
.combined.str.replace("Title: ", "")
.str.replace("; Content:", ": ")
)
if pprint:
for r in results:
print(r[:200])
print()
return results
results = search_reviews(df, "delicious beans", n=3)
Delicious!: I enjoy this white beans seasoning, it gives a rich flavor to the beans I just lov
Fantastic Instant Refried beans: Fantastic Instant Refried Beans have been a staple for my fam
Delicious: While there may be better coffee beans available, this is my first purchase and my
Tasty and Quick Pasta: Barilla Whole Grain Fusilli with Vegetable Marinara is tasty and has an
sooo good: tastes so good. Worth the money. My boyfriend hates wheat pasta and LOVES this. coo
Bland and vaguely gamy tasting, skip this one: As far as prepared dinner kits go, "Barilla Who
We can search through these reviews easily. To speed up computation, we can use a special
algorithm, aimed at faster search through embeddings.
great product, poor delivery: The coffee is excellent and I am a repeat buyer. Problem this t
As we can see, this can immediately deliver a lot of value. In this example we show being able to
quickly find the examples of delivery failures.
Disappointed: The metal cover has severely disformed. And most of the cookies inside have been
results = search_reviews(df, "pet food", n=2)
Great food!: I wanted a food for a a dog with skin problems. His skin greatly improved with th
Great food!: I wanted a food for a a dog with skin problems. His skin greatly improved with th
Cookbook About API Docs Contribute
You can build your own applications with gpt-3.5-turbo or gpt-4 using the OpenAI API.
Chat models take a series of messages as input, and return an AI-written message as output.
This guide illustrates the chat format with a few example API calls.
# if needed, install and/or upgrade to the latest version of the OpenAI Python library
%pip install --upgrade openai
# import the OpenAI Python library for calling the OpenAI API
from openai import OpenAI
import os
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
model : the name of the model you want to use (e.g., gpt-3.5-turbo , gpt-4 , gpt-3.5-
turbo-16k-1106 )
messages : a list of message objects, where each object has two required fields:
role : the role of the messenger (either system , user , assistant or tool )
content : the content of the message (e.g., Write me a beautiful poem )
Messages can also contain an optional name field, which give the messenger a name. E.g.,
example-user , Alice , BlackbeardBot . Names may not contain spaces.
Optional
top_logprobs : Specifies the number of most likely tokens to return at each position.
stop : Specifies up to 4 sequences where the API should stop generating tokens.
top_p : Uses nucleus sampling; considers tokens with top_p probability mass.
As of January 2024, you can also optionally submit a list of functions that tell GPT whether it
can generate JSON to feed into a function. For details, see the documentation, API reference,
or the Cookbook guide How to call functions with chat models.
Typically, a conversation will start with a system message that tells the assistant how to behave,
followed by alternating user and assistant messages, but you are not required to follow this
format.
Let's look at an example chat API calls to see how the chat format works in practice.
print(json.dumps(json.loads(response.model_dump_json()), indent=4))
{
"id": "chatcmpl-8dee9DuEFcg2QILtT2a6EBXZnpirM",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "Orange who?",
"role": "assistant",
"function_call": null,
"tool_calls": null
}
}
],
"created": 1704461729,
"model": "gpt-3.5-turbo-0613",
"object": "chat.completion",
"system_fingerprint": null,
"usage": {
"completion_tokens": 3,
"prompt_tokens": 35,
"total_tokens": 38
}
}
choices : a list of completion objects (only one, unless you set n greater than 1)
finish_reason : the reason the model stopped generating text (either stop , or
tool_calls : The tool calls generated by the model, such as function calls. if the
tools is given
model : the full name of the model used to generate the response
system_fingerprint : This fingerprint represents the backend configuration that the model
runs with.
usage : the number of tokens used to generate the replies, counting prompt, completion,
and total
response.choices[0].message.content
'Orange who?'
Even non-conversation-based tasks can fit into the chat format, by placing the instruction in the
first user message.
For example, to ask the model to explain asynchronous programming in the style of the pirate
Blackbeard, we can structure conversation as follows:
print(response.choices[0].message.content)
Arr, me matey! Let me tell ye a tale of asynchronous programming, in the style of the fearsome
Picture this, me hearties. In the vast ocean of programming, there be times when ye need to per
Ye see, in traditional programming, ye be waitin' for one task to be done before movin' on to t
Instead of waitin' for a task to be completed, ye can be sendin' it off on its own journey, whi
Now, ye may be wonderin', how does this sorcery work? Well, me matey, it be all about callbacks
While the task be sailin' on its own, ye can be movin' on to the next task, without wastin' any
But wait, there be more! With promises, ye can be makin' even fancier arrangements. Instead of
Ye can be attachin' multiple promises to a task, promisin' different outcomes. And when the tas
So, me hearties, that be the tale of asynchronous programming, told in the style of the fearsom
print(response.choices[0].message.content)
Arr, me hearties! Gather 'round and listen up, for I be tellin' ye about the mysterious art of
Now, ye see, in the world of programming, there be times when we need to perform tasks that tak
In the olden days, we pirates used to wait patiently for each task to finish afore movin' on to
That be where asynchronous programming comes in, me mateys. It be a way to tackle multiple task
Ye see, in asynchronous programming, we be breakin' down our tasks into smaller chunks called "
Now, ye might be wonderin', "But Blackbeard, how be we know when a task be finished if we don't
When a coroutine be startin' its work, it be attachin' a callback or a promise to it. This be l
When a coroutine be finished with its task, it be sendin' a signal to the callback or fulfillin
So, me hearties, asynchronous programming be like havin' a crew of pirates workin' on different
Now, set sail, me mateys, and embrace the power of asynchronous programming like true pirates o
Best practices for instructing models may change from model version to model version. The
advice that follows applies to gpt-3.5-turbo-0301 and may not apply to future models.
System messages
The system message can be used to prime the assistant with different personalities or
behaviors.
Be aware that gpt-3.5-turbo-0301 does not generally pay as much attention to the system
message as gpt-4-0314 or gpt-3.5-turbo-0613 . Therefore, for gpt-3.5-turbo-0301 , we
recommend placing important instructions in the user message instead. Some developers have
found success in continually moving the system message near the end of the conversation to
keep the model's attention from drifting away as conversations get longer.
# An example of a system message that primes the assistant to explain concepts in great depth
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a friendly and helpful teaching assistant. You explain
{"role": "user", "content": "Can you explain how fractions work?"},
],
temperature=0,
)
print(response.choices[0].message.content)
Of course! Fractions are a way to represent parts of a whole. They are made up of two numbers:
Let's take an example to understand this better. Imagine you have a pizza that is divided into
Fractions can also be used to represent numbers less than 1. For example, if you eat half of a
Now, let's talk about equivalent fractions. Equivalent fractions are different fractions that r
Here's a question to check your understanding: If you have a cake divided into 12 equal slices
# An example of a system message that primes the assistant to give brief, to-the-point answers
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a laconic assistant. You reply with brief, to-the-poin
{"role": "user", "content": "Can you explain how fractions work?"},
],
temperature=0,
)
print(response.choices[0].message.content)
Fractions represent parts of a whole. They have a numerator (top number) and a denominator (bot
Few-shot prompting
In some cases, it's easier to show the model what you want rather than tell the model what you
want.
One way to show the model what you want is with faked example messages.
For example:
# An example of a faked few-shot conversation to prime the model into translating business jargon to
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful, pattern-following assistant."},
{"role": "user", "content": "Help me translate the following corporate jargon into plain Engl
{"role": "assistant", "content": "Sure, I'd be happy to!"},
{"role": "user", "content": "New synergies will help drive top-line growth."},
{"role": "assistant", "content": "Things working well together will increase revenue."},
{"role": "user", "content": "Let's circle back when we have more bandwidth to touch base on o
{"role": "assistant", "content": "Let's talk later when we're less busy about how to do bette
{"role": "user", "content": "This late pivot means we don't have time to boil the ocean for t
],
temperature=0,
)
print(response.choices[0].message.content)
This sudden change in direction means we don't have enough time to complete the entire project
To help clarify that the example messages are not part of a real conversation, and shouldn't be
referred back to by the model, you can try setting the name field of system messages to
example_user and example_assistant .
# The business jargon translation example, but with example names for the example messages
response = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You are a helpful, pattern-following assistant that translates
{"role": "system", "name":"example_user", "content": "New synergies will help drive top-line
{"role": "system", "name": "example_assistant", "content": "Things working well together will
{"role": "system", "name":"example_user", "content": "Let's circle back when we have more ban
{"role": "system", "name": "example_assistant", "content": "Let's talk later when we're less
{"role": "user", "content": "This late pivot means we don't have time to boil the ocean for t
],
temperature=0,
)
print(response.choices[0].message.content)
This sudden change in direction means we don't have enough time to complete the entire project
If your first attempts fail, don't be afraid to experiment with different ways of priming or
conditioning the model.
As an example, one developer discovered an increase in accuracy when they inserted a user
message that said "Great job so far, these have been perfect" to help condition the model into
providing higher quality responses.
For more ideas on how to lift the reliability of the models, consider reading our guide on
techniques to increase reliability. It was written for non-chat models, but many of its principles
still apply.
4. Counting tokens
When you submit your request, the API transforms the messages into a sequence of tokens.
when the reply gets cut off from hitting the maximum token limit (4,096 for gpt-3.5-turbo
or 8,192 for gpt-4 )
You can use the following function to count the number of tokens that a list of messages will
use.
Note that the exact way that tokens are counted from messages may change from model to
model. Consider the counts from the function below an estimate, not a timeless guarantee.
In particular, requests that use the optional functions input will consume extra tokens on top of
the estimates calculated below.
Read more about counting tokens in How to count tokens with tiktoken.
import tiktoken
# let's verify the function above matches the OpenAI API response
example_messages = [
{
"role": "system",
"content": "You are a helpful, pattern-following assistant that translates corporate jargon i
},
{
"role": "system",
"name": "example_user",
"content": "New synergies will help drive top-line growth.",
},
{
"role": "system",
"name": "example_assistant",
"content": "Things working well together will increase revenue.",
},
{
"role": "system",
"name": "example_user",
"content": "Let's circle back when we have more bandwidth to touch base on opportunities for
},
{
"role": "system",
"name": "example_assistant",
"content": "Let's talk later when we're less busy about how to do better.",
},
{
"role": "user",
"content": "This late pivot means we don't have time to boil the ocean for the client deliver
},
]
for model in [
# "gpt-3.5-turbo-0301",
# "gpt-4-0314",
# "gpt-4-0613",
"gpt-3.5-turbo-1106",
"gpt-3.5-turbo",
"gpt-4",
"gpt-4-1106-preview",
]:
print(model)
# example token count from the function defined above
print(f"{num_tokens_from_messages(example_messages, model)} prompt tokens counted by num_tokens_f
# example token count from the OpenAI API
response = client.chat.completions.create(model=model,
messages=example_messages,
temperature=0,
max_tokens=1)
token = response.usage.prompt_tokens
print(f'{token} prompt tokens counted by the OpenAI API.')
print()
gpt-3.5-turbo-1106
Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-3.5-turbo
Warning: gpt-3.5-turbo may update over time. Returning num tokens assuming gpt-3.5-turbo-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-4
Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
gpt-4-1106-preview
Warning: gpt-4 may update over time. Returning num tokens assuming gpt-4-0613.
129 prompt tokens counted by num_tokens_from_messages().
129 prompt tokens counted by the OpenAI API.
Cookbook About API Docs Contribute
Clustering
Boris Power, Ted Sanders, Logan Kilpatrick
Open in Github
Mar 10, 2022
We use a simple k-means algorithm to demonstrate how clustering can be done. Clustering can
help discover valuable, hidden groupings within the data. The dataset is created in the
Get_embeddings_from_dataset Notebook.
# imports
import numpy as np
import pandas as pd
from ast import literal_eval
# load data
datafile_path = "./data/fine_food_reviews_with_embeddings_1k.csv"
df = pd.read_csv(datafile_path)
df["embedding"] = df.embedding.apply(literal_eval).apply(np.array) # convert string to numpy array
matrix = np.vstack(df.embedding.values)
matrix.shape
(1000, 1536)
We show the simplest use of K-means. You can pick the number of clusters that fits your use
case best.
n_clusters = 4
Cluster
0 4.105691
1 4.191176
2 4.215613
3 4.306590
Name: Score, dtype: float64
x = [x for x, y in vis_dims2]
y = [y for x, y in vis_dims2]
avg_x = xs.mean()
avg_y = ys.mean()
Let's show random samples from each cluster. We'll use text-davinci-003 to name the clusters,
based on a random sample of 5 reviews from that cluster.
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
for i in range(n_clusters):
print(f"Cluster {i} Theme:", end=" ")
reviews = "\n".join(
df[df.Cluster == i]
.combined.str.replace("Title: ", "")
.str.replace("\n\nContent: ", ": ")
.sample(rev_per_cluster, random_state=42)
.values
)
messages = [
{"role": "user", "content": f'What do the following customer reviews have in common?\n\nCusto
]
response = client.chat.completions.create(
model="gpt-4",
messages=messages,
temperature=0,
max_tokens=64,
top_p=1,
frequency_penalty=0,
presence_penalty=0)
print(response.choices[0].message.content.replace("\n", ""))
print("-" * 100)
Cluster 0 Theme: The theme of these customer reviews is food products purchased on Amazon.
5, Loved these gluten free healthy bars, saved $$ ordering on Amazon: These Kind Bars are so
1, Should advertise coconut as an ingredient more prominently: First, these should be called
5, very good!!: just like the runts<br />great flavor, def worth getting<br />I even o
5, Excellent product: After scouring every store in town for orange peels and not finding an
5, delicious: Gummi Frogs have been my favourite candy that I have ever tried. of co
-----------------------------------------------------------------------------------------------
Cluster 1 Theme: Pet food reviews
2, Messy and apparently undelicious: My cat is not a huge fan. Sure, she'll lap up the gravy,
4, The cats like it: My 7 cats like this food but it is a little yucky for the human. Piece
5, cant get enough of it!!!: Our lil shih tzu puppy cannot get enough of it. Everytime she se
1, Food Caused Illness: I switched my cats over from the Blue Buffalo Wildnerness Food to thi
5, My furbabies LOVE these!: Shake the container and they come running. Even my boy cat, who
-----------------------------------------------------------------------------------------------
Cluster 2 Theme: All the reviews are about different types of coffee.
5, Fog Chaser Coffee: This coffee has a full body and a rich taste. The price is far below t
5, Excellent taste: This is to me a great coffee, once you try it you will enjoy it, this
4, Good, but not Wolfgang Puck good: Honestly, I have to admit that I expected a little bette
5, Just My Kind of Coffee: Coffee Masters Hazelnut coffee used to be carried in a local coffe
5, Rodeo Drive is Crazy Good Coffee!: Rodeo Drive is my absolute favorite and I'm ready to or
-----------------------------------------------------------------------------------------------
Cluster 3 Theme: The theme of these customer reviews is food and drink products.
5, Wonderful alternative to soda pop: This is a wonderful alternative to soda pop. It's carb
5, So convenient, for so little!: I needed two vanilla beans for the Love Goddess cake that m
2, bot very cheesy: Got this about a month ago.first of all it smells horrible...it tastes
5, Delicious!: I am not a huge beer lover. I do enjoy an occasional Blue Moon (all o
3, Just ok: I bought this brand because it was all they had at Ranch 99 near us. I
-----------------------------------------------------------------------------------------------
It's important to note that clusters will not necessarily match what you intend to use them for. A
larger amount of clusters will focus on more specific patterns, whereas a small number of
clusters will usually focus on largest discrepencies in the data.
Cookbook About API Docs Contribute
For this notebook we will be looking to classify a public dataset of transactions into a number of
categories that we have predefined. These approaches should be replicable to any multiclass
classification use case where we are trying to fit transactional data into predefined categories,
and by the end of running through this you should have a few approaches for dealing with both
labelled and unlabelled datasets.
Zero-shot Classification: First we'll do zero shot classification to put transactions in one of
five named buckets using only a prompt for guidance
Fine-tuned Classification: Lastly we'll produce a fine-tuned model trained on our labelled
dataset to see how this compares to the zero-shot and few-shot classification approaches
Setup
%load_ext autoreload
%autoreload
%pip install openai 'openai[datalib]' 'openai[embeddings]' transformers
import openai
import pandas as pd
import numpy as np
import json
import os
COMPLETIONS_MODEL = "gpt-4"
Load dataset
We're using a public transaction dataset of transactions over £25k for the Library of Scotland.
The dataset has three features that we'll be using:
Source:
https://data.nls.uk/data/organisational-data/transactions-over-25k/
359
transactions.head()
completion_response = openai.chat.completions.create(
prompt=prompt,
temperature=0,
max_tokens=5,
top_p=1,
frequency_penalty=0,
presence_penalty=0,
model=COMPLETIONS_MODEL)
return completion_response
def classify_transaction(transaction,prompt):
prompt = prompt.replace('SUPPLIER_NAME',transaction['Supplier'])
prompt = prompt.replace('DESCRIPTION_TEXT',transaction['Description'])
prompt = prompt.replace('TRANSACTION_VALUE',str(transaction['Transaction value (£)']))
classification = request_completion(prompt).choices[0].message.content.replace('\n','')
return classification
# This function takes your training and validation outputs from the prepare_data function of the Fine
# confirms that each have the same number of classes.
# If they do not have the same number of classes the fine-tune will fail and return an error
def check_finetune_classes(train_file,valid_file):
train_classes = set()
valid_classes = set()
with open(train_file, 'r') as json_file:
json_list = list(json_file)
print(len(json_list))
if len(train_classes) == len(valid_classes):
print('All good')
else:
print('Classes do not match, please prepare data again')
Zero-shot Classification
We'll first assess the performance of the base models at classifying these transactions using a
simple prompt. We'll provide the model with 5 categories and a catch-all of "Could not classify"
for ones that it cannot place.
zero_shot_prompt = '''You are a data expert working for the National Library of Scotland.
You are analysing all transactions over £25,000 in value and classifying them into one of five catego
The five categories are Building Improvement, Literature & Archive, Utility Bills, Professional Servi
If you can't tell what it is, say Could not classify
Transaction:
Supplier: SUPPLIER_NAME
Description: DESCRIPTION_TEXT
Value: TRANSACTION_VALUE
Building Improvement
Our first attempt is correct, M & J Ballantyne Ltd are a house builder and the work they
performed is indeed Building Improvement.
Lets expand the sample size to 25 and see how it performs, again with just a simple prompt to
guide it
test_transactions = transactions.iloc[:25]
test_transactions['Classification'] = test_transactions.apply(lambda x: classify_transaction(x,zero_s
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launche
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
test_transactions['Classification'].value_counts()
Building Improvement 14
Could not classify 5
Literature & Archive 3
Software/IT 2
Utility Bills 1
Name: Classification, dtype: int64
test_transactions.head(25)
Transaction value
Date Supplier Description (£) Classification
Lets create embeddings from the small set that we've classified so far - we've made a set of
labelled examples by running the zero-shot classifier on 101 transactions from our dataset and
manually correcting the 15 Could not classify results that we got
Create embeddings
This initial section reuses the approach from the Get_embeddings_from_dataset Notebook to
create embeddings from a combined field concatenating all of our features
df = pd.read_csv('./data/labelled_transactions.csv')
df.head()
Transaction value
Date Supplier Description (£) Classification
29/05/2017 Morris & Spottiswood Ltd George IV Bridge Work 56448 Building
2
Improvement
101
embedding_path = './data/transactions_with_embeddings_100.csv'
Now that we have our embeddings, let see if classifying these into the categories we've named
gives us any more success.
fs_df = pd.read_csv(embedding_path)
fs_df["babbage_similarity"] = fs_df.babbage_similarity.apply(literal_eval).apply(np.array)
fs_df.head()
Unnamed: Transaction
0 Date Supplier Description value (£) Classification combined n_tokens
clf = RandomForestClassifier(n_estimators=100)
clf.fit(X_train, y_train)
preds = clf.predict(X_test)
probas = clf.predict_proba(X_test)
accuracy 0.95 21
macro avg 0.78 0.80 0.79 21
weighted avg 0.91 0.95 0.93 21
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/metrics/_
_warn_prf(average, modifier, msg_start, len(result))
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/metrics/_
_warn_prf(average, modifier, msg_start, len(result))
/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/metrics/_
_warn_prf(average, modifier, msg_start, len(result))
Performance for this model is pretty strong, so creating embeddings and using even a simpler
classifier looks like an effective approach as well, with the zero-shot classifier helping us do the
initial classification of the unlabelled dataset.
Lets take it one step further and see if a fine-tuned model trained on this same labelled datasets
gives us comparable results
For this use case we're going to try to improve on the few-shot classification from above by
training a fine-tuned model on the same labelled set of 101 transactions and applying this fine-
tuned model on group of unseen transactions
First we'll list out our classes and replace them with numeric identifiers. Making the model
predict a single token rather than multiple consecutive ones like 'Building Improvement'
should give us better results
We also need to add a common prefix and suffix to each example to aid the model in
making predictions - in our case our text is already started with 'Supplier' and we'll add a
suffix of '\n\n###\n\n'
Lastly we'll aid a leading whitespace onto each of our target classes for classification, again
to aid the model
ft_prep_df = fs_df.copy()
len(ft_prep_df)
101
ft_prep_df.head()
Unnamed: Transaction
0 Date Supplier Description value (£) Classification combined n_tokens
classes = list(set(ft_prep_df['Classification']))
class_df = pd.DataFrame(classes).reset_index()
class_df.columns = ['class_id','class']
class_df , len(class_df)
( class_id class
0 0 Literature & Archive
1 1 Utility Bills
2 2 Building Improvement
3 3 Software/IT
4 4 Other,
5)
ft_df_with_class = ft_prep_df.merge(class_df,left_on='Classification',right_on='class',how='inner')
# Adding a common separator onto the end of each prompt so the model knows when a prompt is terminati
ft_df_with_class['prompt'] = ft_df_with_class.apply(lambda x: x['combined'] + '\n\n###\n\n',axis=1)
ft_df_with_class.head()
Unnamed: Transaction
0 Date Supplier Description value (£) Classification combined n_tokens
import random
prompt completion
ordering
# This step is to remove any existing files if we've already produced training/validation sets for th
#!rm transactions_grouped*
# We output our shuffled dataframe to a .jsonl file and run the prepare_data function to get us our i
ft_df_sorted.to_json("transactions_grouped.jsonl", orient='records', lines=True)
!openai tools fine_tunes.prepare_data -f transactions_grouped.jsonl -q
# This functions checks that your classes all appear in both prepared files
# If they don't, the fine-tuned model creation will fail
check_finetune_classes('transactions_grouped_prepared_train.jsonl','transactions_grouped_prepared_val
31
8
All good
# You can use following command to get fine tuning job status and model name, replace the job name wi
#!openai api fine_tunes.get -i ft-YBIc01t4hxYBC7I5qhRF3Qdx
Now we'll apply our classifier to see how it performs. We only had 31 unique observations in
our training set and 8 in our validation set, so lets see how the performance is
prompt completion
test_set['result'].value_counts()
True 4
False 4
Name: result, dtype: int64
Performance is not great - unfortunately this is expected. With only a few examples of each
class, the above approach with embeddings and a traditional classifier worked better.
A fine-tuned model works best with a great number of labelled observations. If we had a few
hundred or thousand we may get better results, but lets do one last test on a holdout set to
confirm that it doesn't generalise well to a new set of observations
holdout_df = transactions.copy().iloc[101:]
holdout_df.head()
holdout_df.head(10)
Transaction
Date Supplier Description value (£) combined prediction_result
Description:
Caus...
holdout_df['pred'].value_counts()
2 231
0 27
Name: pred, dtype: int64
Well those results were similarly underwhelming - so we've learned that with a dataset with a
small number of labelled observations, either zero-shot classification or traditional classification
with embeddings return better results than a fine-tuned model.
A fine-tuned model is still a great tool, but is more effective when you have a larger number of
labelled examples for each class that you're looking to classify
Cookbook About API Docs Contribute
This notebook shows how to use OpenAI's DALL·E image API endpoints.
Setup
Import your OpenAI API key: You can do this by running ` export OPENAI_API_KEY="your
API key" ` in your terminal.
# imports
from openai import OpenAI # OpenAI Python library to make API calls
import requests # used to download images
import os # used to access filepaths
from PIL import Image # used to print and edit images
image_dir='./images'
Generations
The generation API endpoint creates an image based on a text prompt. API Reference
Required inputs:
prompt (str): A text description of the desired image(s). The maximum length is 1000
Optional inputs:
model (str): The model to use for image generation. Defaults to dall-e-2
n (int): The number of images to generate. Must be between 1 and 10. Defaults to 1.
quality (str): The quality of the image that will be generated. hd creates images with finer
details and greater consistency across the image. This param is only supported for dall-e-3.
response_format (str): The format in which the generated images are returned. Must be
size (str): The size of the generated images. Must be one of 256x256, 512x512, or
1024x1024 for dall-e-2. Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3
models. Defaults to "1024x1024".
style (str | null): The style of the generated images. Must be one of vivid or natural. Vivid
causes the model to lean towards generating hyper-real and dramatic images. Natural
causes the model to produce more natural, less hyper-real looking images. This param is
only supported for dall-e-3.
user (str): A unique identifier representing your end-user, which will help OpenAI to
# print response
print(generation_response)
Variations
The variations endpoint generates new images (variations) similar to an input image. API
Reference
Required inputs:
image (str): The image to use as the basis for the variation(s). Must be a valid PNG file, less
Optional inputs:
model (str): The model to use for image variations. Only dall-e-2 is supported at this time.
n (int): The number of images to generate. Must be between 1 and 10. Defaults to 1.
size (str): The size of the generated images. Must be one of "256x256", "512x512", or
response_format (str): The format in which the generated images are returned. Must be
user (str): A unique identifier representing your end-user, which will help OpenAI to
monitor and detect abuse. Learn more.
# create variations
# print response
print(variation_response)
Edits
The edit endpoint uses DALL·E to generate a specified portion of an existing image. Three inputs
are needed: the image to edit, a mask specifying the portion to be regenerated, and a prompt
describing the desired image. API Reference
Required inputs:
image (str): The image to edit. Must be a valid PNG file, less than 4MB, and square. If mask
is not provided, image must have transparency, which will be used as the mask.
prompt (str): A text description of the desired image(s). The maximum length is 1000
characters.
Optional inputs:
mask (file): An additional image whose fully transparent areas (e.g. where alpha is zero)
indicate where image should be edited. Must be a valid PNG file, less than 4MB, and have
the same dimensions as image.
model (str): The model to use for edit image. Only dall-e-2 is supported at this time.
n (int): The number of images to generate. Must be between 1 and 10. Defaults to 1.
size (str): The size of the generated images. Must be one of "256x256", "512x512", or
"1024x1024". Smaller images are faster. Defaults to "1024x1024".
response_format (str): The format in which the generated images are returned. Must be
user (str): A unique identifier representing your end-user, which will help OpenAI to
monitor and detect abuse. Learn more.
Set Edit Area
An edit requires a "mask" to specify which portion of the image to regenerate. Any pixel with an
alpha of 0 (transparent) will be regenerated. The code below creates a 1024x1024 mask where
the bottom half is transparent.
# create a mask
width = 1024
height = 1024
mask = Image.new("RGBA", (width, height), (0, 0, 0, 1)) # create an opaque image mask
Perform Edit
Now we supply our image, caption and mask to the API to get 5 examples of edits to our image
# edit an image
# print response
print(edit_response)
Named Entity Recognition (NER) is a Natural Language Processing task that identifies and
classifies named entities (NE) into predefined semantic categories (such as persons,
organizations, locations, events, time expressions, and quantities). By converting raw text into
structured information, NER makes data more actionable, facilitating tasks like information
extraction, data aggregation, analytics, and social media monitoring.
This notebook demonstrates how to carry out NER with chat completion and functions-calling
to enrich a text with links to a knowledge base such as Wikipedia:
Text:
In Germany, in 1440, goldsmith Johannes Gutenberg invented the movable-type printing press. His
work led to an information revolution and the unprecedented mass-spread of literature
throughout Europe. Modelled on the design of the existing screw presses, a single Renaissance
movable-type printing press could produce up to 3,600 pages per workday.
In Germany, in 1440, goldsmith Johannes Gutenberg invented the movable-type printing press.
His work led to an information revolution and the unprecedented mass-spread of literature
throughout Europe. Modelled on the design of the existing screw presses, a single Renaissance
movable-type printing press could produce up to 3,600 pages per workday.
Inference Costs: The notebook also illustrates how to estimate OpenAI API costs.
1. Setup
1.1 Install/Upgrade Python packages
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
You can generate an API key in the OpenAI web interface. See
https://platform.openai.com/account/api-keys for details.
This notebook works with the latest OpeanAI models gpt-3.5-turbo-0613 and gpt-4-0613 .
import json
import logging
import os
import openai
import wikipedia
OPENAI_MODEL = 'gpt-3.5-turbo-0613'
We define a standard set of NER labels to showcase a wide range of use cases. However, for our
specific task of enriching text with knowledge base links, only a subset is practically required.
labels = [
"person", # people, including fictional characters
"fac", # buildings, airports, highways, bridges
"org", # organizations, companies, agencies, institutions
"gpe", # geopolitical entities like countries, cities, states
"loc", # non-gpe locations
"product", # vehicles, foods, appareal, appliances, software, toys
"event", # named sports, scientific milestones, historical events
"work_of_art", # titles of books, songs, movies
"law", # named laws, acts, or legislations
"language", # any named language
"date", # absolute or relative dates or periods
"time", # time units smaller than a day
"percent", # percentage (e.g., "twenty percent", "18%")
"money", # monetary values, including unit
"quantity", # measurements, e.g., weight or distance
]
3. Prepare messages
The chat completions API takes a list of messages as input and delivers a model-generated
message as an output. While the chat format is primarily designed for facilitating multi-turn
conversations, it is equally efficient for single-turn tasks without any preceding conversation. For
our purposes, we will specify a message for the system, assistant, and user roles.
The system message (prompt) sets the assistant's behavior by defining its desired persona and
task. We also delineate the specific set of entity labels we aim to identify.
Although one can instruct the model to format its response, it has to be noted that both gpt-
3.5-turbo-0613 and gpt-4-0613 have been fine-tuned to discern when a function should be
invoked, and to reply with JSON formatted according to the function's signature. This capability
streamlines our prompt and enables us to receive structured data directly from the model.
def system_message(labels):
return f"""
You are an expert in Natural Language Processing. Your task is to identify common Named Entities (NER
The possible common Named Entities (NER) types are exclusively: ({", ".join(labels)})."""
Assistant messages usually store previous assistant responses. However, as in our scenario,
they can also be crafted to provide examples of the desired behavior. While OpenAI is able to
execute zero-shot Named Entity Recognition, we have found that a one-shot approach
produces more precise results.
def assisstant_message():
return f"""
EXAMPLE:
Text: 'In Germany, in 1440, goldsmith Johannes Gutenberg invented the movable-type printing press
of literature throughout Europe. Modelled on the design of the existing screw presses, a single R
{{
"gpe": ["Germany", "Europe"],
"date": ["1440"],
"person": ["Johannes Gutenberg"],
"product": ["movable-type printing press"],
"event": ["Renaissance"],
"quantity": ["3,600 pages"],
"time": ["workday"]
}}
--"""
The user message provides the specific text for the assistant task:
def user_message(text):
return f"""
TASK:
Text: {text}
"""
In an OpenAI API call, we can describe functions to gpt-3.5-turbo-0613 and gpt-4-0613 and
have the model intelligently choose to output a JSON object containing arguments to call those
functions . It's important to note that the chat completions API doesn't actually execute the
function . Instead, it provides the JSON output, which can then be used to call the function
in our code. For more details, refer to the OpenAI Function Calling Guide.
return None
return text
4. ChatCompletion
2. Invoke the model using the messages , tools and tool_choice parameters.
3. Convert the output into a JSON object, and then call the function with the arguments
provided by the model.
In practice, one might want to re-invoke the model again by appending the function response
as a new message, and let the model summarize the results back to the user. Nevertheless, for
our purposes, this step is not needed.
Note that in a real-case scenario it is strongly recommended to build in user confirmation flows
before taking actions.
Since we want the model to output a dictionary of labels and recognized entities:
{
"gpe": ["Germany", "Europe"],
"date": ["1440"],
"person": ["Johannes Gutenberg"],
"product": ["movable-type printing press"],
"event": ["Renaissance"],
"quantity": ["3,600 pages"],
"time": ["workday"]
}
we need to define the corresponding JSON schema to be passed to the tools parameter:
Now, we invoke the model. It's important to note that we direct the API to use a specific
function by setting the tool_choice parameter to {"type": "function", "function" :
{"name": "enrich_entities"}} .
response_message = response.choices[0].message
function_to_call = available_functions[function_name]
logging.info(f"function_to_call: {function_to_call}")
function_args = json.loads(response_message.tool_calls[0].function.arguments)
logging.info(f"function_args: {function_args}")
display(Markdown(f"""**Text:** {text}
**Enriched_Text:** {result['function_response']}"""))
<IPython.core.display.Markdown object>
To estimate the inference costs, we can parse the response's "usage" field. Detailed token costs
per model are available in the OpenAI Pricing Guide:
print(f"""Token Usage
Prompt: {i_tokens} tokens
Completion: {o_tokens} tokens
Cost estimation: ${round(i_cost + o_cost, 5)}""")
Token Usage
Prompt: 331 tokens
Completion: 47 tokens
Cost estimation: $0.00059
Cookbook About API Docs Contribute
Open in Colab
Perform semantic search on the Elasticsearch index using the encoded question
Send the top search results to the OpenAI Chat Completions API endpoint for retrieval
augmented generation (RAG)
ℹ️ If you've already worked through our semantic search notebook, you can skip ahead to the
final step!
# install packages
# import modules
Connect to Elasticsearch
ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't
already have an Elastic deployment, you can sign up for a free Elastic Cloud trial.
To connect to Elasticsearch, you need to create a client instance with the Cloud ID and password
for your deployment.
In this step we download the OpenAI Wikipedia embeddings dataset, and extract the zip file.
embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
wget.download(embeddings_url)
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip",
"r") as zip_ref:
zip_ref.extractall("data")
wikipedia_dataframe = pd.read_csv("data/vector_database_wikipedia_articles_embedded.csv")
Now we need to create an Elasticsearch index with the necessary mappings. This will enable us
to index the data into Elasticsearch.
We use the dense_vector field type for the title_vector and content_vector fields. This is a
special field type that allows us to store dense vectors in Elasticsearch.
Later, we'll need to target the dense_vector field for kNN search.
index_mapping= {
"properties": {
"title_vector": {
"type": "dense_vector",
"dims": 1536,
"index": "true",
"similarity": "cosine"
},
"content_vector": {
"type": "dense_vector",
"dims": 1536,
"index": "true",
"similarity": "cosine"
},
"text": {"type": "text"},
"title": {"type": "text"},
"url": { "type": "keyword"},
"vector_id": {"type": "long"}
}
}
client.indices.create(index="wikipedia_vector_index", mappings=index_mapping)
The following function generates the required bulk actions that can be passed to Elasticsearch's
Bulk API, so we can index multiple documents efficiently in a single request.
For each row in the DataFrame, the function yields a dictionary representing a single document
to be indexed.
def dataframe_to_bulk_actions(df):
for index, row in df.iterrows():
yield {
"_index": 'wikipedia_vector_index',
"_id": row['id'],
"_source": {
'url' : row["url"],
'title' : row["title"],
'text' : row["text"],
'title_vector' : json.loads(row["title_vector"]),
'content_vector' : json.loads(row["content_vector"]),
'vector_id' : row["vector_id"]
}
}
As the dataframe is large, we will index data in batches of 100 . We index the data into
Elasticsearch using the Python client's helpers for the bulk API.
start = 0
end = len(wikipedia_dataframe)
batch_size = 100
for batch_start in range(start, end, batch_size):
batch_end = min(batch_start + batch_size, end)
batch_dataframe = wikipedia_dataframe.iloc[batch_start:batch_end]
actions = dataframe_to_bulk_actions(batch_dataframe)
helpers.bulk(client, actions)
print(client.search(index="wikipedia_vector_index", body={
"_source": {
"excludes": ["title_vector", "content_vector"]
},
"query": {
"match": {
"text": {
"query": "Hummingbird"
}
}
}
}))
To perform kNN search, we need to encode queries with the same embedding model used to
encode the documents at index time. In this example, we need to use the text-embedding-3-
small model.
# Define model
EMBEDDING_MODEL = "text-embedding-3-small"
# Define question
question = 'Is the Atlantic the biggest ocean in the world?'
# Create embedding
question_embedding = openai.Embedding.create(input=question, model=EMBEDDING_MODEL)
Now we're ready to run queries against our Elasticsearch index using our encoded question.
We'll be doing a k-nearest neighbors search, using the Elasticsearch kNN query option.
def pretty_response(response):
for hit in response['hits']['hits']:
id = hit['_id']
score = hit['_score']
title = hit['_source']['title']
text = hit['_source']['text']
pretty_output = (f"\nID: {id}\nTitle: {title}\nSummary: {text}\nScore: {score}")
print(pretty_output)
response = client.search(
index = "wikipedia_vector_index",
knn={
"field": "content_vector",
"query_vector": question_embedding["data"][0]["embedding"],
"k": 10,
"num_candidates": 100
}
)
pretty_response(response)
ID: 1936
Title: Atlantic Ocean
Summary: The Atlantic Ocean is the world's second largest ocean. It covers a total area of abo
Geologic history
The Atlantic formed when the Americas moved west from Eurasia and Africa. This began sometime i
The east coast of South America is shaped somewhat like the west coast of Africa, and this gave
Geography
The Atlantic Ocean is bounded on the west by North and South America. It connects to the Arctic
In the southeast, the Atlantic merges into the Indian Ocean. The 20° East meridian defines its
In the southwest, the Drake Passage connects it to the Pacific Ocean. The Panama Canal links th
The Atlantic Ocean is second in size to the Pacific. It occupies an area of about . The volume
The average depth of the Atlantic, along with its adjacent seas, is . The greatest depth is Mil
Gulf Stream
The Atlantic Ocean has important ocean currents. One of these, called the Gulf Stream, flows a
There are currents in the South Atlantic too, but the shape of this sea means that it has less
Geology
The main feature of the Atlantic Ocean's seabed is a large underwater mountain chain called the
Success! We've used kNN to perform semantic search over our dataset and found the top
results.
Now we can use the Chat Completions API to work some generative AI magic using the top
search result as additional context.
Now we can send the question and the text to OpenAI's chat completion API.
Using a LLM model together with a retrieval model is known as retrieval augmented generation
(RAG). We're using Elasticsearch to do what it does best, retrieve relevant documents. Then we
use the LLM to do what it does best, tasks like generating summaries and answering questions,
using the retrieved documents as context.
The model will generate a response to the question, using the top kNN hit as context. Use the
messages list to shape your prompt to the model. In this example, we're using the gpt-3.5-
turbo model.
summary = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Answer the following question:"
+ question
+ "by using the following text:"
+ top_hit_summary},
]
)
choices = summary.choices
------------------------------------------------------------
No, the Atlantic Ocean is not the biggest ocean in the world. It is the second largest ocean, c
------------------------------------------------------------
Code explanation
Sends a conversation containing a system message and a user message to the model
The user message contains a question as specified in the original kNN query and some
input text
Next steps
That was just one example of how to combine Elasticsearch with the power of OpenAI's models,
to enable retrieval augmented generation. RAG allows you to avoid the costly and complex
process of training or fine-tuning models, by leveraging out-of-the-box models, enhanced with
additional context.
To adapt the conversation for different use cases, customize the system message to define the
assistant's behavior or persona. Adjust the user message to specify the task, such as
summarization or question answering, along with the desired format of the response.
Cookbook About API Docs Contribute
This example will cover embeddings using the Azure OpenAI service.
Setup
First, we install the necessary dependencies and import the libraries we will be using.
import os
import openai
import dotenv
dotenv.load_dotenv()
Authentication
The Azure OpenAI service supports multiple authentication mechanisms that include API keys
and Azure Active Directory token credentials.
use_azure_active_directory = False # Set this flag to True if you are using Azure Active Directory
To set up the OpenAI SDK to use an Azure API Key, we need to set api_key to a key associated
with your endpoint (you can find this key in "Keys and Endpoints" under "Resource Management"
in the Azure Portal). You'll also find the endpoint for your resource here.
if not use_azure_active_directory:
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_API_KEY"]
client = openai.AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2023-09-01-preview"
)
Let's now see how we can autheticate via Azure Active Directory. We'll start by installing the
azure-identity library. This library will provide the token credentials we need to authenticate
and help us build a token credential provider through the get_bearer_token_provider helper
function. It's recommended to use get_bearer_token_provider over providing a static token to
AzureOpenAI because this API will automatically cache and refresh tokens for you.
For more information on how to set up Azure Active Directory authentication with Azure
OpenAI, see the documentation.
if use_azure_active_directory:
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_API_KEY"]
client = openai.AzureOpenAI(
azure_endpoint=endpoint,
azure_ad_token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiv
api_version="2023-09-01-preview"
)
“Note: the AzureOpenAI infers the following arguments from their corresponding
environment variables if they are not provided:”
Deployments
In this section we are going to create a deployment of a model that we can use to create
embeddings.
Let's deploy a model to use with embeddings. Go to https://portal.azure.com, find your Azure
OpenAI resource, and then navigate to the Azure OpenAI Studio. Click on the "Deployments"
tab and then create a deployment for the model you want to use for embeddings. The
deployment name that you give the model will be used in the code below.
deployment = "" # Fill in the deployment name from the portal here
Embeddings
embeddings = client.embeddings.create(
model=deployment,
input="The food was delicious and the waiter..."
)
print(embeddings)
Cookbook About API Docs Contribute
Customizing embeddings
Ted Sanders, Boris Power
Open in Github
Mar 9, 2022
This notebook demonstrates one way to customize OpenAI embeddings to a particular task.
The input is training data in the form of [text_1, text_2, label] where label is +1 if the pairs are
similar and -1 if the pairs are dissimilar.
The output is a matrix that you can use to multiply your embeddings. The product of this
multiplication is a 'custom embedding' that will better emphasize aspects of the text relevant to
your use case. In binary classification use cases, we've seen error rates drop by as much as 50%.
In the following example, I use 1,000 sentence pairs picked from the SNLI corpus. Each pair of
sentences are logically entailed (i.e., one implies the other). These pairs are our positives (label =
1). We generate synthetic negatives by combining sentences from different pairs, which are
presumed to not be logically entailed (label = -1).
For a clustering use case, you can generate positives by creating pairs from texts in the same
clusters and generate negatives by creating pairs from sentences in different clusters.
With other data sets, we have seen decent improvement with as little as ~100 training
examples. Of course, performance will be better with more examples.
0. Imports
# imports
from typing import List, Tuple # for type hints
1. Inputs
Most inputs are here. The key things to change are where to load your datset from, where to
save a cache of embeddings to, and which embedding engine you want to use.
Depending on how your data is formatted, you'll want to rewrite the process_input_data
function.
# input parameters
embedding_cache_path = "data/snli_embedding_cache.pkl" # embeddings will be saved/loaded here
default_embedding_engine = "text-embedding-3-small"
num_pairs_to_embed = 1000 # 1000 is arbitrary
local_dataset_path = "data/snli_1.0_train_2k.csv" # download from: https://nlp.stanford.edu/projects
# load data
df = pd.read_csv(local_dataset_path)
# view data
df.head()
/var/folders/r4/x3kdvs816995fnnph2gdpwp40000gn/T/ipykernel_17509/1977422881.py:13: SettingWithC
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/i
df["label"] = df["label"].apply(lambda x: {"entailment": 1, "contradiction": -1}[x])
2 A person on a horse jumps over a broken down a... A person is outdoors, on a horse. 1
7 A boy is jumping on skateboard in the middle o... The boy does a skateboarding trick. 1
14 Two blond women are hugging one another. There are women showing affection. 1
17 A few people in a restaurant setting, one of t... The diners are at a restaurant. 1
Note that it's important to split data into training and test sets before generating synethetic
negatives or positives. You don't want any text strings in the training data to show up in the test
data. If there's contamination, the test metrics will look better than they'll actually be in
production.
This is another piece of the code that you will need to modify to match your use case.
If you have data with positives and negatives, you can skip this section.
If you have data with only positives, you can mostly keep it as is, where it generates negatives
only.
If you have multiclass data, you will want to generate both positives and negatives. The
positives can be pairs of text that share labels, and the negatives can be pairs of text that do not
share labels.
The final output should be a dataframe with text pairs, where each pair is labeled -1 or 1.
# generate negatives
def dataframe_of_negatives(dataframe_of_positives: pd.DataFrame) -> pd.DataFrame:
"""Return dataframe of negative pairs made by combining elements of positive pairs."""
texts = set(dataframe_of_positives["text_1"].values) | set(
dataframe_of_positives["text_2"].values
)
all_pairs = {(t1, t2) for t1 in texts for t2 in texts if t1 < t2}
positive_pairs = set(
tuple(text_pair)
for text_pair in dataframe_of_positives[["text_1", "text_2"]].values
)
negative_pairs = all_pairs - positive_pairs
df_of_negatives = pd.DataFrame(list(negative_pairs), columns=["text_1", "text_2"])
df_of_negatives["label"] = -1
return df_of_negatives
negatives_per_positive = (
1 # it will work at higher values too, but more data will be slower
)
# generate negatives for training dataset
train_df_negatives = dataframe_of_negatives(train_df)
train_df_negatives["dataset"] = "train"
# generate negatives for test dataset
test_df_negatives = dataframe_of_negatives(test_df)
test_df_negatives["dataset"] = "test"
# sample negatives and combine with positives
train_df = pd.concat(
[
train_df,
train_df_negatives.sample(
n=len(train_df) * negatives_per_positive, random_state=random_seed
),
]
)
test_df = pd.concat(
[
test_df,
test_df_negatives.sample(
n=len(test_df) * negatives_per_positive, random_state=random_seed
),
]
)
df = pd.concat([train_df, test_df])
5. Calculate embeddings and cosine similarities
Here, I create a cache to save the embeddings. This is handy so that you don't have to pay again
if you want to run the code again.
# this function will get embeddings from the cache and save them there afterward
def get_embedding_with_cache(
text: str,
engine: str = default_embedding_engine,
embedding_cache: dict = embedding_cache,
embedding_cache_path: str = embedding_cache_path,
) -> list:
if (text, engine) not in embedding_cache.keys():
# if not in cache, call API to get embedding
embedding_cache[(text, engine)] = get_embedding(text, engine)
# save embeddings cache to disk after each update
with open(embedding_cache_path, "wb") as embedding_cache_file:
pickle.dump(embedding_cache, embedding_cache_file)
return embedding_cache[(text, engine)]
Here we measure similarity of text using cosine similarity. In our experience, most distance
functions (L1, L2, cosine similarity) all work about the same. Note that our embeddings are
already normalized to length 1, so cosine similarity is equivalent to dot product.
The graphs show how much the overlap there is between the distribution of cosine similarities
for similar and dissimilar pairs. If there is a high amount of overlap, that means there are some
dissimilar pairs with greater cosine similarity than some similar pairs.
The accuracy I compute is the accuracy of a simple rule that predicts 'similar (1)' if the cosine
similarity is above some threshold X and otherwise predicts 'dissimilar (0)'.
def optimize_matrix(
modified_embedding_length: int = 2048, # in my brief experimentation, bigger was better (2048 is
batch_size: int = 100,
max_epochs: int = 100,
learning_rate: float = 100.0, # seemed to work best when similar to batch size - feel free to tr
dropout_fraction: float = 0.0, # in my testing, dropout helped by a couple percentage points (de
df: pd.DataFrame = df,
print_progress: bool = True,
save_results: bool = True,
) -> torch.tensor:
"""Return matrix optimized to minimize loss on training data."""
run_id = random.randint(0, 2 ** 31 - 1) # (range is arbitrary)
# convert from dataframe to torch tensors
# e is for embedding, s for similarity label
def tensors_from_dataframe(
df: pd.DataFrame,
embedding_column_1: str,
embedding_column_2: str,
similarity_label_column: str,
) -> Tuple[torch.tensor]:
e1 = np.stack(np.array(df[embedding_column_1].values))
e2 = np.stack(np.array(df[embedding_column_2].values))
s = np.stack(np.array(df[similarity_label_column].astype("float").values))
e1 = torch.from_numpy(e1).float()
e2 = torch.from_numpy(e2).float()
s = torch.from_numpy(s).float()
data = pd.DataFrame(
{"epoch": epochs, "type": types, "loss": losses, "accuracy": accuracies}
)
data["run_id"] = run_id
data["modified_embedding_length"] = modified_embedding_length
data["batch_size"] = batch_size
data["max_epochs"] = max_epochs
data["learning_rate"] = learning_rate
data["dropout_fraction"] = dropout_fraction
data[
"matrix"
] = matrices # saving every single matrix can get big; feel free to delete/change
if save_results is True:
data.to_csv(f"{run_id}_optimization_results.csv", index=False)
return data
runs_df = pd.concat(results)
8. Plot the before & after, showing the results of the best
matrix found during training
The better the matrix is, the more cleanly it will separate the similar and dissimilar pairs.
a, se = accuracy_and_se(test_df["cosine_similarity_custom"], test_df["label"])
print(f"Test accuracy after customization: {a:0.1%} ± {1.96 * se:0.1%}")
Supabase Vector is a vector toolkit built on pgvector, a Postgres extension that allows you to
store your embeddings inside the same database that holds the rest of your application data.
When combined with pgvector's indexing algorithms, vector search remains fast at large scales.
Supabase adds an ecosystem of services and tools on top of Postgres that makes app
development as quick as possible, including:
Realtime APIs
Authentication
File storage
Edge functions
We can use these services alongside pgvector to store and query embeddings within Postgres.
Below are guides and resources that walk you through how to use OpenAI embedding models
with Supabase Vector.
Guide Description
Semantic search Store, index, and query embeddings at scale using pgvector
Additional resources
Vector columns
Vector indexes
Going to production
Deciding on compute
Cookbook About API Docs Contribute
Azure Data Explorer aka Kusto is a cloud-based data analytics service that enables users to
perform advanced analytics on large datasets in real-time. It is particularly well-suited for
handling large volumes of data, making it an excellent choice for storing and searching vectors.
Kusto supports a special data type called dynamic, which can store unstructured data such as
arrays and properties bag. Dynamic data type is perfect for storing vector values. You can
further augment the vector value by storing metadata related to the original object as separate
columns in your table.
Kusto also supports in-built function series_cosine_similarity_fl to perform vector similarity
searches.
Kusto_Vector
Demo Scenario
Wiki_embeddings
semantic_search_flow
If you’d like to try this demo, please follow the instructions in the Notebook.
It will allow you to -
In this notebook we will classify the sentiment of reviews using embeddings and zero labeled
data! The dataset is created in the Get_embeddings_from_dataset Notebook.
We'll define positive sentiment to be 4- and 5-star reviews, and negative sentiment to be 1- and
2-star reviews. 3-star reviews are considered neutral and we won't use them for this example.
We will perform zero-shot classification by embedding descriptions of each class and then
comparing new samples to those class embeddings.
import pandas as pd
import numpy as np
from ast import literal_eval
EMBEDDING_MODEL = "text-embedding-3-small"
datafile_path = "data/fine_food_reviews_with_embeddings_1k.csv"
df = pd.read_csv(datafile_path)
df["embedding"] = df.embedding.apply(literal_eval).apply(np.array)
Zero-Shot Classification
To perform zero shot classification, we want to predict labels for our samples without any
training. To do this, we can simply embed short descriptions of each label, such as positive and
negative, and then compare the cosine distance between embeddings of samples and label
descriptions.
The highest similarity label to the sample input is the predicted label. We can also define a
prediction score to be the difference between the cosine distance to the positive and to the
negative label. This score can be used for plotting a precision-recall curve, which can be used to
select a different tradeoff between precision and recall, by selecting a different threshold.
def evaluate_embeddings_approach(
labels = ['negative', 'positive'],
model = EMBEDDING_MODEL,
):
label_embeddings = [get_embedding(label, model=model) for label in labels]
Using the search embeddings and descriptive names leads to an additional improvement in
performance.
As shown above, zero-shot classification with embeddings can lead to great results, especially
when the labels are more descriptive than just simple words.
Cookbook About API Docs Contribute
Visualizing embeddings in 3D
Boris Power, Ted Sanders
Open in Github
Mar 10, 2022
The example uses PCA to reduce the dimensionality fo the embeddings from 1536 to 3. Then
we can visualize the data points in a 3D plot. The small dataset dbpedia_samples.jsonl is
curated by randomly sampling 200 samples from DBpedia validation dataset.
import pandas as pd
samples = pd.read_json("data/dbpedia_samples.jsonl", lines=True)
categories = sorted(samples["category"].unique())
print("Categories of DBpedia samples:", samples["category"].value_counts())
samples.head()
text category
i k (金華山 i k ) l k i k l l
%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np
# Plot each sample category individually such that we can set label name.
for i, cat in enumerate(categories):
sub_matrix = np.array(samples[samples["category"] == cat]["embed_vis"].to_list())
x=sub_matrix[:, 0]
y=sub_matrix[:, 1]
z=sub_matrix[:, 2]
colors = [cmap(i/len(categories))] * len(sub_matrix)
ax.scatter(x, y, zs=z, zdir='z', c=colors, label=cat)
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')
ax.legend(bbox_to_anchor=(1.1, 1))
<matplotlib.legend.Legend at 0x1622180a0>
Cookbook About API Docs Contribute
The OpenAI API embeddings endpoint can be used to measure relatedness or similarity
between pieces of text.
Embeddings can be used for semantic search, recommendations, cluster analysis, near-duplicate
detection, and more.
For comparison with other embedding models, see Massive Text Embedding Benchmark
(MTEB) Leaderboard
Semantic search
Embeddings can be used for search either by themselves or as a feature in a larger system.
Split your text corpus into chunks smaller than the token limit (8,191 tokens for text-
embedding-3-small )
Embed each chunk of text
Store those embeddings in your own database or in a vector search provider like
Pinecone, Weaviate or Qdrant
In more advanced search systems, the cosine similarity of embeddings can be used as one
feature among many in ranking search results.
Question answering
The best way to get reliably honest answers from GPT-3 is to give it source documents in which
it can locate correct answers. Using the semantic search procedure above, you can cheaply
search through a corpus of documents for relevant information and then give that information
to GPT-3 via the prompt to answer a question. We demonstrate this in
Question_answering_using_embeddings.ipynb.
Recommendations
Recommendations are quite similar to search, except that instead of a free-form text query, the
inputs are items in a set.
Similar to search, these cosine similarity scores can either be used on their own to rank items or
as features in larger ranking algorithms.
Customizing Embeddings
Although OpenAI's embedding model weights cannot be fine-tuned, you can nevertheless use
training data to customize embeddings to your application.
There is no simple answer - it depends. However, if your task involves logical reasoning or
complexity, consider trying the techniques in this article to build more reliable, high-performing
prompts.
If you were asked to multiply 13 by 17, would the answer pop immediately into your mind? For
most of us, probably not. Yet, that doesn't mean humans are incapable of two-digit
multiplication. With a few seconds, and some pen and paper, it's not too taxing to work out that
13 x 17 = 130 + 70 + 21 = 221.
Similarly, if you give GPT-3 a task that's too complex to do in the time it takes to calculate its
next token, it may confabulate an incorrect guess. Yet, akin to humans, that doesn't necessarily
mean the model is incapable of the task. With some time and space to reason things out, the
model still may be able to answer reliably.
As an example, if you ask gpt-3.5-turbo-instruct the following math problem about juggling
balls, it answers incorrectly:
Q: A juggler has 16 balls. Half of the balls are golf balls and half of the golf balls are blue. Ho
A:
Does this mean that GPT-3 cannot do simple math problems? No; in fact, it turns out that by
prompting the model with Let's think step by step , the model solves the problem reliably:
Q: A juggler has 16 balls. Half of the balls are golf balls and half of the golf balls are blue. Ho
A: Let's think step by step.
Of course, it's hard to tell from only a single example whether this Let's think step by step
trick actually works in general or just got lucky on this particular problem. But it really does
work. On a benchmark of word math problems, the Let's think step by step trick raised
GPT-3's solve rate massively, from a worthless 18% to a decent 79%!
When learning to work with GPT-3, one common conceptual mistake is to believe that its
capabilities are fixed across all contexts. E.g., if GPT-3 gets a simple logic question wrong, then it
must be incapable of simple logic.
But as the Let's think step by step example illustrates, apparent failures of GPT-3 can
sometimes be remedied with a better prompt that helps the model steer itself toward the
correct output.
Generate many outputs, and then use the model to pick the best one
One way to give a model more time and space to think is to break tasks into simpler pieces.
As an example, consider a task where we ask the model a multiple-choice question about some
text - in this case, a game of Clue. When asked directly, gpt-3.5-turbo-instruct isn't able to
put clues 3 & 5 together, and answers incorrectly:
Although clues 3 and 5 establish that Colonel Mustard was the only person in the observatory
and that the person in the observatory had the candlestick, the model fails to combine them
into a correct answer of (a) Yes.
However, instead of asking for the answer directly, we can split the task into three pieces:
First, go through the clues one by one and consider whether the clue is potentially relevant
Second, combine the relevant clues to reason out the answer to the question
Use the following clues to answer the following multiple-choice question, using the following proce
(1) First, go through the clues one by one and consider whether the clue is potentially relevant
(2) Second, combine the relevant clues to reason out the answer to the question
(3) Third, map the answer to one of the multiple choice answers: either (a), (b), or (c)
Clues:
1. Miss Scarlett was the only person in the lounge.
2. The person with the pipe was in the kitchen.
3. Colonel Mustard was the only person in the observatory.
4. Professor Plum was not in the library nor the billiard room.
5. The person with the candlestick was in the observatory.
Question: Was Colonel Mustard in the observatory with the candlestick?
(a) Yes; Colonel Mustard was in the observatory with the candlestick
(b) No; Colonel Mustard was not in the observatory with the candlestick
(c) Unknown; there is not enough information to determine whether Colonel Mustard was in the observ
Solution:
(1) First, go through the clues one by one and consider whether the clue is potentially relevant:
Clue 1: Miss Scarlett was the only person in the lounge. This clue is not relevant.
Clue 2: The person with the pipe was in the kitchen. This clue is not relevant.
Clue 3: Colonel Mustard was the only person in the observatory. This clue is relevant.
Clue 4: Professor Plum was not in the library nor the billiard room. This clue is not relevant.
Clue 5: The person with the candlestick was in the observatory. This clue is relevant.
(2) Second, combine the relevant clues to reason out the answer to the question:
Clue 3 and Clue 5 both say that the person with the candlestick was in the observatory. Colonel Mus
(3) Third, write the final answer:
(a) Yes; Colonel Mustard was in the observatory with the candlestick
By giving the model more time and space to think, and guiding it along a reasoning plan, it's
able to figure out the correct answer of (a) Yes.
Another benefit of splitting complex instructions into smaller subtasks is that it can help keep
the model focused on each subtask.
For example, if we ask gpt-3.5-turbo-instruct to summarize a text in its original language, the
model can lapse back into English:
Summarize the text using the original language of the text. The summary should be one sentence long
Text:
"""
La estadística (la forma femenina del término alemán Statistik, derivado a su vez del italiano stat
"""
Summary:
The text explains that statistics is a science that studies the variability, collection, organizati
However, if we first ask the model to identify the language of the text, and then summarize the
text, it becomes more reliable:
First, identify the language of the text. Second, summarize the text using the original language of
Text:
"""
La estadística (la forma femenina del término alemán Statistik, derivado a su vez del italiano stat
"""
Language:
Spanish
La estadística es una ciencia que estudia la variabilidad, colección, organización, análisis, inter
Another powerful technique for improving the reliability of answers is to prompt the model to
gradually reason out the answer rather than jumping immediately to the final answer. By
'thinking aloud' the model can be far more likely to arrive at the correct answer.
Zero-shot
Method
Published by Takeshi Kojima et al. in 2022, the easiest way to prompt a model to reason out
the answer is to simply prepend answers with Let's think step by step. Figure 2 illustrates
an example:
Source: Large Language Models are Zero-Shot Reasoners by Takeshi Kojima et al. (2022).
Results
Applying this simple trick to the MultiArith math dataset, the authors found Let's think step
by step quadrupled the accuracy, from 18% to 79%!
Source: Large Language Models are Zero-Shot Reasoners by Takeshi Kojima et al. (2022).
Implications
Although the Let's think step by step trick works well on math problems, it's not effective
on all tasks. The authors found that it was most helpful for multi-step arithmetic problems,
symbolic reasoning problems, strategy problems, and other reasoning problems. It didn't help
with simple math problems or common sense questions, and presumably wouldn't help with
many other non-reasoning tasks either.
Source: Large Language Models are Zero-Shot Reasoners by Takeshi Kojima et al. (2022).
If you apply this technique to your own tasks, don't be afraid to experiment with customizing
the instruction. Let's think step by step is rather generic, so you may find better
performance with instructions that hew to a stricter format customized to your use case. For
example, you can try more structured variants like First, think step by step about why X
might be true. Second, think step by step about why Y might be true. Third, think step
by step about whether X or Y makes more sense. . And you can even give the model an
Using the IRS guidance below, answer the following questions using this format:
(1) For each criterion, determine whether it is met by the vehicle purchase
- {Criterion} Let's think step by step. {explanation} {yes or no, or if the question does not apply
(2) After considering each criterion in turn, phrase the final answer as "Because of {reasons}, the
IRS guidance:
"""
You may be eligible for a federal tax credit under Section 30D if you purchased a car or truck that
- Does the vehicle have at least four wheels?
- Does the vehicle weigh less than 14,000 pounds?
- Does the vehicle draw energy from a battery with at least 4 kilowatt hours that may be recharged
- Was the vehicle purchased in a year before 2022?
- If so, has the manufacturer sold less than 200,000 qualifying vehicles? (Tesla and GM have sold
- Was the vehicle purchased in a year after 2022?
- If so, is the vehicle present in the following list of North American-assembled vehicles? (The
"""
Question: Can I claim a federal tax credit for my Toyota Prius Prime bought in 2021?
Solution:
(1) For each criterion, determine whether it is met by the vehicle purchase
- Does the vehicle have at least four wheels? Let's think step by step.
The Toyota Prius Prime has four wheels, so the answer is yes.
- Does the vehicle weigh less than 14,000 pounds? Let's think step by step. The Toyota Prius Prime
- Does the vehicle draw energy from a battery with at least 4 kilowatt hours that may be recharged
- Was the vehicle purchased in a year before 2022? Let's think step by step. The Toyota Prius Prime
- Was the vehicle purchased in a year after 2022? N/A
- If so, is the vehicle present in the following list of North American-assembled vehicles? N/A
(2) After considering each criterion in turn, phrase the final answer as "Because of {reasons}, the
Because the Toyota Prius Prime meets all of the criteria for a federal tax credit, the answer is li
Few-shot examples
Method
Prompting the model to reason out its answers can be done in many ways. One way is to
demonstrate with a few examples ('few-shot'), as studied by Jason Wei and Denny Zhou et al.
from Google. Here's an example few-shot chain-of-thought prompt:
Source: Chain of Thought Prompting Elicits Reasoning in Large Language Models Jason Wei
and Denny Zhou et al. (2022)
(Note that it has been called into question whether pears actually float)
Results
Testing on grade school math problems, the authors found that chain of thought prompting
tripled the solve rate, from 18% to 57%.
Source: Chain of Thought Prompting Elicits Reasoning in Large Language Models Jason Wei
and Denny Zhou et al. (2022)
In addition to math problems, chain of thought prompting also lifted performance on questions
related to sports understanding, coin flip tracking, and last letter concatenation. In most cases,
not many examples were need to saturate the performance gains (less than 8 or so).
Source: Chain of Thought Prompting Elicits Reasoning in Large Language Models Jason Wei
and Denny Zhou et al. (2022)
Implications
One advantage of the few-shot example-based approach relative to the Let's think step by
step technique is that you can more easily specify the format, length, and style of reasoning
that you want the model to perform before landing on its final answer. This can be particularly
helpful in cases where the model isn't initially reasoning in the right way or depth.
Fine-tuned
Method
In general, to eke out maximum performance on a task, you'll need to fine-tune a custom
model. However, fine-tuning a model using explanations may take thousands of example
explanations, which are costly to write.
In 2022, Eric Zelikman and Yuhuai Wu et al. published a clever procedure for using a few-shot
prompt to generate a dataset of explanations that could be used to fine-tune a model. The idea
is to use a few-shot prompt to generate candidate explanations, and only keep the explanations
that produce the correct answer. Then, to get additional explanations for some of the incorrect
answers, retry the few-shot prompt but with correct answers given as part of the question. The
authors called their procedure STaR (Self-taught Reasoner):
Source: STaR: Bootstrapping Reasoning With Reasoning by Eric Zelikman and Yujuai Wu et al.
(2022)
With this technique, you can combine the benefits of fine-tuning with the benefits of chain-of-
thought prompting without needing to write thousands of example explanations.
Results
When the authors applied this technique to a Common Sense Q&A dataset, they found that
STaR outperformed both chain-of-thought prompting alone (73% > 37%) and fine-tuning alone
(73% > 60%):
Source: STaR: Bootstrapping Reasoning With Reasoning by Eric Zelikman and Yujuai Wu et al.
(2022)
Implications
Using a few-shot prompt to extend or modify a fine-tuning dataset is an idea that can be
generalized beyond explanation writing. For example, if you have large quantities of
unstructured text that you want to train on, you may find opportunities to use a prompt to
extract a structured dataset from your unstructured text, and then fine-tune a custom model on
that structured dataset.
Selection-inference prompting
Method
Published by Antonia Creswell et al., one extension of the chain-of-thought technique is to split
the single prompt for generating explanations and answers into smaller parts. First, a prompt
selects a relevant subset of facts from the text ('selection prompt'). Then, a second prompt infers
a conclusion from the selected facts ('inference prompt'). These prompts are then alternated in
a loop to generate multiple steps of reasoning and eventually land on a final answer. The
authors illustrate the idea in the following figure:
Source: Selection-Inference: Exploiting Large Language Models for Interpretable Logical
Reasoning by Antonia Creswell et al. (2022)
Results
When applied to a 7B-parameter model, the authors found that selection-inference prompting
substantially improved performance relative to chain-of-thought prompting on the bAbi and
Proof Writer benchmark tasks (both of which require longer sequences of reasoning steps). The
best performance they achieved combined both selection-inference prompting with fine-tuning.
Source: Selection-Inference: Exploiting Large Language Models for Interpretable Logical
Reasoning by Antonia Creswell et al. (2022)
Implications
Although the gains on these benchmarks were large, these benchmarks were specifically chosen
because they required longer sequences of reasoning. On problems that don't require
reasoning with many steps, the gains are likely smaller.
The results highlight a couple of general lessons for working with large language models. One,
splitting up complex tasks into smaller tasks is a great way to improve reliability and
performance; the more atomic the task, the less room there is for the model to err. Two, getting
maximum performance often means combining fine-tuning with whatever approach you've
chosen.
reducing hallucination of fake facts by fine-tuning a model to reason about sentence labels
(e.g., sen1) rather than writing out the sentences themselves
Method
In the original selection-inference technique, specialized 'selection' and 'inference' prompts are
alternated to select facts and make inferences from those facts, combining to generate a
sequence of reasoning steps.
First, the authors add a 'halter' model that, after each inference step, is asked whether the
inferences thus far are sufficient to answer the question. If yes, then the model generates a final
answer.
if the process never halts, you'll get no answer, which is often preferable to a hallucinated
guess
Source: Faithful Reasoning Using Large Language Models by Antonia Creswell et al. (2022)
Source: Faithful Reasoning Using Large Language Models by Antonia Creswell et al. (2022)
Second, the authors add a value function, which is used to assess the quality of reasoning steps
and search over multiple reasoning trajectories. This echoes a common theme for increasing
reliability; instead of generating a single answer from the model, generate a set of answers and
then use some type of value function / discriminator / verifier model to pick the best one.
Source: Faithful Reasoning Using Large Language Models by Antonia Creswell et al. (2022)
In addition to these two extensions, the authors also use a trick to reduce hallucination of fake
facts. Rather than asking the model to write out factual sentences, they fine-tune a model to
work with sentence labels (e.g., sen1) instead. This helps prevent the model from hallucinating
fake facts not mentioned in the prompt context.
Source: Faithful Reasoning Using Large Language Models by Antonia Creswell et al. (2022)
Results
The authors evaluated their technique on two benchmarks: the ProofWriter task (not shown)
and EntailmentBankQA (shown). The technique increased accuracy substantially, especially on
harder reasoning problems.
Source: Faithful Reasoning Using Large Language Models by Antonia Creswell et al. (2022)]
(https://arxiv.org/abs/2208.14271)
Implications
This paper illustrates a number of helpful lessons for improving the reliability of large language
models:
Generate many possible answers and use another model or function to pick the ones that
look best
Reduce hallucination by constraining what the model can say (e.g., by using sentence labels
instead of sentences)
Least-to-most prompting
In addition to doing poorly on long reasoning chains (where selection-inference shines), chain-
of-thought prompting can especially struggle when the examples are short but the task is long.
Method
Least-to-most prompting is another technique that splits up reasoning tasks into smaller, more
reliable subtasks. The idea is to elicit a subtask from the model by prompting it with something
like To solve {question}, we need to first solve: " . Then, with that subtask in hand, the
model can generate a solution. The solution is appended to the original question and the
process is repeated until a final answer is produced.
Source: Least-to-most Prompting Enables Complex Reasoning in Large Language Models by
Denny Zhou et al. (2022)
Results
When applied to benchmarks involving long reasoning chains using code-davinci-002 (which
is optimized for code but can still understand text), the authors measured gains as large as 16%
-> 99.7%!
Implications
Although the above gains from least-to-most prompting are impressive, they are measured on
a very narrow set of tasks that require long reasoning chains.
Still, they illustrate a common theme: increase reliability by (a) breaking complex tasks into
smaller subtasks and (b) giving the model more time and space to work out the answer.
Related ideas
Maieutic prompting
Method
In contrast to the previous techniques, which try to maximize the likelihood of correct answers,
another approach is to use GPT-3 to generate a tree of possible explanations (both correct and
incorrect), and then analyze their relationships to guess at which set is correct. This technique
was coined maieutic prompting by Jaehun Jung et al. in May 2022 (maieutic means relating to
the Socratic method of asking questions to elicit ideas).
First, build a maieutic tree, where each node is a statement that could be true or false:
Start with a multiple-choice question or true/false statement (e.g. War cannot have a
tie )
For each possible answer to the question, use the model to generate a corresponding
explanation (with a prompt like War cannot have a tie? True, because )
Then, prompt the model with the question and the generated explanation, and ask it
to produce the answer. If reversing the explanation (with a prefix like It is wrong to
say that {explanation} ) reverses the answer, then the explanation is considered
'logically integral.'
If an explanation is not logically integral, then repeat the above process recursively,
with each explanation turned into a True or False question, and generate more
explanations for each new question.
After all of the recursive explaining is done, you end up with a tree of explanations,
where each leaf on the tree has the property that reversing the explanation reverses
the model's answer.
Second, convert the tree into a graph of relations:
For each node in the tree, calculate the model's relative belief in each node (inferred
from the probability of getting an answer of True to given an explanation)
For each pair of nodes in the tree, use the model to identify whether they are entailed
(implied) or contradicted
Third, find the most consistent set of beliefs and take those to be true:
Specifically, using the strength of belief in each node and the logical relationships
between them, formulate the problem as a weighted maximum satisfiability problem
(MAX-SAT)
Use a solver to the find the most self-consistent set of beliefs, and take those as true
Source: Maieutic Prompting: Logically Consistent Reasoning with Recursive Explanations by
Jaehun Jung et al. (2022)
Results
Beyond the complexity, one limitation of this method is that it appears to only apply to
questions that can be posed as multiple-choice.
Extensions
Self-consistency
Method
For tasks with a discrete set of answers, one simple way to improve reliability is to sample
multiple explanations & answers from the model (using a positive temperature) and then pick
the final answer that appears most often.
Results
This technique lifted accuracies by anywhere from 1 to 24 percentage points on a suite of math
and reasoning benchmarks. (Plotted below are results from Google's LaMDA model; using
Google's larger PaLM model, the baselines were higher but the gains were a bit smaller.)
Source: Self-Consistency Improves Chain of Thought Reasoning in Language Models by Xuezhi
Wang et al. (2022)
Implications
Although this technique is simple to implement, it can be costly. Generating a set of 10 answers
will increase your costs by 10x.
Also, as with many of these techniques, it applies only to tasks with a limited set of answers. For
open-ended tasks where each answer is unique (such as writing a poem), it's not obvious what
it would mean to pick the most common answer.
Lastly, this technique ought to be most beneficial when there are multiple paths or phrasings to
reach an answer; if there's only one path, then the technique may not help at all. An extreme
example: If the task was to generate a single token answer, then taking the most common token
from 100 generations would be no different than taking the token with the highest logprobs
(which you can get with a single generation at temperature=0).
Verifiers
Another key technique for improving task performance is to train a verifier or discriminator
model to evaluate the outputs of the main generative model. If the discriminator rejects the
output, then you can resample the generative model until you get an acceptable output. In
many cases, it's easier to judge an answer than it is to create an answer, which helps explain the
power of this method.
Method
In 2021, OpenAI researchers applied this technique to grade school math problems, using the
following procedure:
For each problem in the training set, they generated 100 solutions
Each of those 100 solutions was automatically labeled as either correct or incorrect, based
on whether the final answer was correct
Using those solutions, with some labeled correct and some labeled incorrect, they fine-
tuned a verifier model to classify whether a question and candidate solution was correct or
incorrect
Finally, at test time, the generative model creates 100 solutions to each problem, and the
one with the highest score according to the verifier model is picked as the final answer
Source: Training Verifiers to Solve Math Word Problems by Karl Cobbe et al. (2021)
Results
With a 175B GPT-3 model and 8,000 training examples, this technique substantially lifted grade
school math accuracy from ~33% to ~55%.
Source: Training Verifiers to Solve Math Word Problems by Karl Cobbe et al. (2021)
Implications
Similar to the self-consistency technique, this method can get expensive, as generating, say, 100
solutions per task will increase your costs by roughly ~100x.
Theories of reliability
Although the techniques above vary in their approach, they all share the goal of improving
reliability on complex tasks. Mainly they do this by:
decomposing unreliable operations into smaller, more reliable operations (e.g., selection-
inference prompting)
using multiple steps or multiple relationships to make the system's reliability greater than
any individual component (e.g., maieutic prompting)
In the paper Language Model Cascades, David Dohan et al. interpret the above techniques in the
paradigm of probabilistic graphical models:
Selection-inference prompting
Verifiers
Source: Language Model Cascades by David Dohan et al. (2022)
Implications
Closing thoughts
Research into large language models is very active and evolving rapidly. Not only do
researchers continue to improve the models, they also continue to improve our understanding
of how to best employ the models. To underscore the pace of these developments, note that all
of the papers shared above were published within the past 12 months (as I write in Sep 2022).
In the future, expect better models and better techniques to be published. Even if the specific
techniques here are eclipsed by future best practices, the general principles behind them will
likely remain a key part of any expert user's toolkit.
Bibliography
Lesson Paper Date
Break complex tasks into simpler subtasks (and AI Chains: Transparent and Controllable 2021
consider exposing the intermediate outputs to Human-AI Interaction by Chaining Large Oct
users) Language Model Prompts
You can improve output by generating many Training Verifiers to Solve Math Word 2021
candidates, and then picking the one that looks Problems Oct
best
On reasoning tasks, models do better when they Chain of Thought Prompting Elicits 2022
reason step-by-step before answering Reasoning in Large Language Models Jan
You can improve step-by-step reasoning by Self-Consistency Improves Chain of Thought 2022
generating many explanation-answer outputs, and Reasoning in Language Models Mar
picking the most popular answer
If you want to fine-tune a step-by-step reasoner, STaR: Bootstrapping Reasoning With 2022
you can do it with multiple-choice question & Reasoning Mar
answer data alone
The step-by-step reasoning method works great Large Language Models are Zero-Shot 2022
even with zero examples Reasoners May
You can do better than step-by-step reasoning by Selection-Inference: Exploiting Large 2022
alternating a ‘selection’ prompt and an ‘inference’ Language Models for Interpretable Logical May
prompt Reasoning
On long reasoning problems, you can improve Least-to-most Prompting Enables Complex 2022
step-by-step reasoning by splitting the problem Reasoning in Large Language Models May
into pieces to solve incrementally
You can have the model analyze both good and Maieutic Prompting: Logically Consistent 2022
bogus explanations to figure out which set of Reasoning with Recursive Explanations May
explanations are most consistent
You can think about these techniques in terms of Language Model Cascades 2022
probabilistic programming, where systems Jul
comprise unreliable components
You can eliminate hallucination with sentence label Faithful Reasoning Using Large Language 2022
manipulation, and you can reduce wrong answers Models Aug
with a 'halter' prompt
Cookbook About API Docs Contribute
CassIO version
In this quickstart you will learn how to build a "philosophy quote finder & generator" using
OpenAI's vector embeddings and Apache Cassandra®, or equivalently DataStax Astra DB
through CQL, as the vector store for data persistence.
The basic workflow of this notebook is outlined below. You will evaluate and store the vector
embeddings for a number of quotes by famous philosophers, use them to build a powerful
search engine and, after that, even a generator of new quotes!
The notebook exemplifies some of the standard usage patterns of vector search -- while
showing how easy is it to get started with the vector capabilities of Cassandra / Astra DB
through CQL.
For a background on using vector search and text embeddings to build a question-answering
system, please check out this excellent hands-on notebook: Question answering using
embeddings.
Choose-your-framework
Please note that this notebook uses the CassIO library, but we cover other choices of
technology to accomplish the same task. Check out this folder's README for other options. This
notebook can run either as a Colab notebook or as a regular Jupyter notebook.
Table of contents:
Setup
Get DB connection
Connect to OpenAI
How it works
Indexing
Each quote is made into an embedding vector with OpenAI's Embedding . These are saved in the
Vector Store for later use in searching. Some metadata, including the author's name and a few
other pre-computed tags, are stored alongside, to allow for search customization.
Search
To find a quote similar to the provided search quote, the latter is made into an embedding
vector on the fly, and this vector is used to query the store for similar vectors ... i.e. similar
quotes that were previously indexed. The search can optionally be constrained by additional
metadata ("find me quotes by Spinoza similar to this one ...").
The key point here is that "quotes similar in content" translates, in vector space, to vectors that
are metrically close to each other: thus, vector similarity search effectively implements semantic
similarity. This is the key reason vector embeddings are so powerful.
The sketch below tries to convey this idea. Each quote, once it's made into a vector, is a point in
space. Well, in this case it's on a sphere, since OpenAI's embedding vectors, as most others, are
normalized to unit length. Oh, and the sphere is actually not three-dimensional, rather 1536-
dimensional!
So, in essence, a similarity search in vector space returns the vectors that are closest to the
query vector:
Generation
Given a suggestion (a topic or a tentative quote), the search step is performed, and the first
returned results (quotes) are fed into an LLM prompt which asks the generative model to invent
a new text along the lines of the passed examples and the initial suggestion.
Setup
import cassio
from cassio.table import MetadataVectorCassandraTable
import openai
from datasets import load_dataset
Get DB connection
In order to connect to your Astra DB through CQL, you need two things:
Make sure you have both strings -- which are obtained in the Astra UI once you sign in. For
more information, see here: database ID and Token.
If you want to connect to a Cassandra cluster (which however must support Vector Search),
replace with cassio.init(session=..., keyspace=...) with suitable Session and keyspace
name for your cluster.
cassio.init(token=astra_token, database_id=database_id)
Creation of the DB connection
This is how you create a connection to Astra DB through CQL:
(Incidentally, you could also use any Cassandra cluster (as long as it provides Vector capabilities),
just by changing the parameters to the following Cluster instantiation.)
Connect to OpenAI
Quickly check how one can get the embedding vectors for a list of input texts:
client = openai.OpenAI(api_key=OPENAI_API_KEY)
embedding_model_name = "text-embedding-3-small"
result = client.embeddings.create(
input=[
"This is a sentence",
"A second sentence"
],
model=embedding_model_name,
)
Note: the above is the syntax for OpenAI v1.0+. If using previous versions, the code to get the
embeddings will look different.
print(f"len(result.data) = {len(result.data)}")
print(f"result.data[1].embedding = {str(result.data[1].embedding)[:55]}...")
print(f"len(result.data[1].embedding) = {len(result.data[1].embedding)}")
len(result.data) = 2
result.data[1].embedding = [-0.010821706615388393, 0.001387271680869162, 0.0035479...
len(result.data[1].embedding) = 1536
Note: the above is the syntax for OpenAI v1.0+. If using previous versions, the code to get the
embeddings will look different.
philo_dataset = load_dataset("datastax/philosopher-quotes")["train"]
A quick inspection:
An example entry:
{'author': 'aristotle', 'quote': 'Love well, be loved and do something of value.', 'tags': 'lov
To optimize speed and reduce the calls, you'll perform batched calls to the embedding OpenAI
service.
(Note: for faster execution, Cassandra and CassIO would let you do concurrent inserts, which we
don't do here for a more straightforward demo code.)
BATCH_SIZE = 50
quotes_list = philo_dataset["quote"]
authors_list = philo_dataset["author"]
tags_list = philo_dataset["tags"]
For the quote-search functionality, you need first to make the input quote into a vector, and
then use it to query the store (besides handling the optional metadata into the search call, that
is).
[('Life to the great majority is only a constant struggle for mere existence, with the certaint
'schopenhauer'),
('We give up leisure in order that we may have leisure, just as we go to war in order that we
'aristotle'),
('Perhaps the gods are kind to us, by making life more disagreeable as we grow older. In the e
'freud')]
Search constrained to a tag (out of those saved earlier with the quotes):
[('Mankind will never see an end of trouble until lovers of wisdom come to hold political power
'plato'),
('Everything the State says is a lie, and everything it has it has stolen.',
'nietzsche')]
To keep this issue under control, you can get the actual "distance" between the query and each
result, and then set a cutoff on it, effectively discarding results that are beyond that threshold.
Tuning this threshold correctly is not an easy problem: here, we'll just show you the way.
To get a feeling on how this works, try the following query and play with the choice of quote
and threshold to compare the results:
Note (for the mathematically inclined): this "distance" is exactly the cosine similarity between the
vectors, i.e. the scalar product divided by the product of the norms of the two vectors. As such, it is
a number ranging from -1 to +1, where -1 is for exactly opposite-facing vectors and +1 for
identically-oriented vectors. Elsewhere (e.g. in the "CQL" counterpart of this demo) you would get
a rescaling of this quantity to fit the [0, 1] interval, which means the resulting numerical values
and adequate thresholds there are transformed accordingly.
metric_threshold = 0.84
quote_vector = client.embeddings.create(
input=[quote],
model=embedding_model_name,
).data[0].embedding
results = list(v_table.metric_ann_search(
quote_vector,
n=8,
metric="cos",
metric_threshold=metric_threshold,
))
For this task you need another component from OpenAI, namely an LLM to generate the quote
for us (based on input obtained by querying the Vector Store).
You also need a template for the prompt that will be filled for the generate-quote LLM
completion task.
completion_model_name = "gpt-3.5-turbo"
ACTUAL EXAMPLES:
{examples}
"""
Like for search, this functionality is best wrapped into a handy function (which internally uses
search):
Just passing a text (a "quote", but one can actually just suggest a topic since its vector
embedding will still end up at the right place in the vector space):
** quotes found:
** - Happiness is the reward of virtue. (aristotle)
** - Our moral virtues benefit mainly other people; intellectual virtues, on the other hand,
** end of logging
** quotes found:
** - Because Christian morality leaves animals out of account, they are at once outlawed in
** - The assumption that animals are without rights, and the illusion that our treatment of
** end of logging
(Optional) Partitioning
There's an interesting topic to examine before completing this quickstart. While, generally, tags
and quotes can be in any relationship (e.g. a quote having multiple tags), authors are effectively
an exact grouping (they define a "disjoint partitioning" on the set of quotes): each quote has
exactly one author (for us, at least).
Now, suppose you know in advance your application will usually (or always) run queries on a
single author. Then you can take full advantage of the underlying database structure: if you
group quotes in partitions (one per author), vector queries on just an author will use less
resources and return much faster.
We'll not dive into the details here, which have to do with the Cassandra storage internals: the
important message is that if your queries are run within a group, consider partitioning
accordingly to boost performance.
v_table_partitioned = ClusteredMetadataVectorCassandraTable(table="philosophers_cassio_partitioned",
Compared to what you have seen earlier, there is a crucial difference in that now the quote's
author is stored as the partition id for the inserted row, instead of being added to the catch-all
"metadata" dictionary.
While you are at it, by way of demonstration, you will insert all quotes by a given author
concurrently: with CassIO, this is done by usng the asynchronous put_async method for each
quote, collecting the resulting list of Future objects, and calling the result() method on
them all afterwards, to ensure they all have executed. Cassandra / Astra DB well supports a high
degree of concurrency in I/O operations.
(Note: one could have cached the embeddings computed previously to save a few API tokens --
here, however, we wanted to keep the code easier to inspect.)
BATCH_SIZE = 50
quotes_list = philo_dataset["quote"]
authors_list = philo_dataset["author"]
tags_list = philo_dataset["tags"]
With this new table, the similarity search changes accordingly (note the arguments to
ann_search ):
That's it: the new table still supports the "generic" similarity searches all right ...
[('Life to the great majority is only a constant struggle for mere existence, with the certaint
'schopenhauer'),
('We give up leisure in order that we may have leisure, just as we go to war in order that we
'aristotle'),
('Perhaps the gods are kind to us, by making life more disagreeable as we grow older. In the e
'freud')]
... but it's when an author is specified that you would notice a huge performance advantage:
find_quote_and_author_p("We struggle all our life for nothing", 2, author="nietzsche")
Well, you would notice a performance gain, if you had a realistic-size dataset. In this demo, with
a few tens of entries, there's no noticeable difference -- but you get the idea.
Conclusion
Congratulations! You have learned how to use OpenAI for vector embeddings and Cassandra /
Astra DB through CQL for storage in order to build a sophisticated philosophical search engine
and quote generator.
This example used CassIO to interface with the Vector Store - but this is not the only choice.
Check the README for other options and integration with popular frameworks.
To find out more on how Astra DB's Vector Search capabilities can be a key ingredient in your
ML/GenAI applications, visit Astra DB's web page on the topic.
Cleanup
If you want to remove all resources used for this demo, run this cell (warning: this will delete the
tables and the data inserted in them!):
<cassandra.cluster.ResultSet at 0x7fdcc42e8f10>
Cookbook About API Docs Contribute
SingleStoreDB
arno756
Open in Github
May 21, 2023
SingleStoreDB has first-class support for vector search through our Vector Functions. Our
vector database subsystem, first made available in 2017 and subsequently enhanced, allows
extremely fast nearest-neighbor search to find objects that are semantically similar, easily using
SQL.
SingleStoreDB supports vectors and vector similarity search using dot_product (for cosine
similarity) and euclidean_distance functions. These functions are used by our customers for
applications including face recognition, visual product photo search and text-based semantic
search. With the explosion of generative AI technology, these capabilities form a firm
foundation for text-based AI chatbots.
But remember, SingleStoreDB is a high-performance, scalable, modern SQL DBMS that supports
multiple data models including structured data, semi-structured data based on JSON, time-
series, full text, spatial, key-value and of course vector data. Start powering your next intelligent
application with SingleStoreDB today!
This folder contains examples of using SingleStoreDB and OpenAI together. We will keep
adding more scenarios so stay tuned!
Name Description
OpenAI wikipedia semantic Improve ChatGPT accuracy through SingleStoreDB semantic Search in
search QA
Cookbook About API Docs Contribute
This notebook presents how to implement a Question Answering system with Langchain,
Qdrant as a knowledge based and OpenAI embeddings. If you are not familiar with Qdrant, it's
better to check out the Getting_started_with_Qdrant_and_OpenAI.ipynb notebook.
4. Using Qdrant to perform the nearest neighbour search in the created collection to find
some context.
All the steps will be simplified to calling some corresponding Langchain methods.
Prerequisites
3. Langchain as a framework.
We're going to use a local Qdrant instance running in a Docker container. The easiest way to
launch it is to use the attached [docker-compose.yaml] file and run the following command:
! docker-compose up -d
We might validate if the server was launched successfully by running a simple curl command:
! curl http://localhost:6333
Install requirements
This notebook obviously requires the openai , langchain and qdrant-client packages.
The OpenAI API key is used for vectorization of the documents and queries.
If you don't have an OpenAI API key, you can get one from
https://beta.openai.com/account/api-keys.
Once you get your key, please add it to your environment variables as OPENAI_API_KEY by
running following command:
# Note. alternatively you can set a temporary env variable like this:
# os.environ["OPENAI_API_KEY"] = "sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
OPENAI_API_KEY is ready
Load data
In this section we are going to load the data containing some natural questions and answers to
them. All the data will be used to create a Langchain application with Qdrant being the
knowledge base.
import wget
'answers.json'
import json
print(answers[0])
No . overall No. in season Title Directed by Written by Original air date U.S. viewers ( millio
Chain definition
Langchain is already integrated with Qdrant and performs all the indexing for given list of
documents. In our case we are going to store the set of answers we have.
embeddings = OpenAIEmbeddings()
doc_store = Qdrant.from_texts(
answers, embeddings, host="localhost"
)
At this stage all the possible answers are already stored in Qdrant, so we can define the whole
QA chain.
llm = OpenAI()
qa = VectorDBQA.from_chain_type(
llm=llm,
chain_type="stuff",
vectorstore=doc_store,
return_source_documents=False,
)
Search data
Once the data is put into Qdrant we can start asking some questions. A question will be
automatically vectorized by OpenAI model, and the created vector will be used to find some
possibly matching answers in Qdrant. Once retrieved, the most similar answers will be
incorporated into the prompt sent to OpenAI Large Language Model. The communication
between all the services is shown on a graph:
import random
random.seed(52)
selected_questions = random.choices(questions, k=5)
Use the following pieces of context to answer the question at the end. If you don't know the answer
{context}
Question: {question}
Helpful Answer:
We can, however, provide our prompt template and change the behaviour of the OpenAI LLM,
while still using the stuff chain type. It is important to keep {context} and {question} as
placeholders.
2. Suggests a random song title if it doesn't know the answer to our question.
custom_prompt = """
Use the following pieces of context to answer the question at the end. Please provide
a short single-sentence summary answer only. If you don't know the answer or if it's
not present in given context, don't try to make up an answer, but suggest me a random
unrelated song title I could listen to.
Context: {context}
Question: {question}
Helpful Answer:
"""
custom_prompt_template = PromptTemplate(
template=custom_prompt, input_variables=["context", "question"]
)
custom_qa = VectorDBQA.from_chain_type(
llm=llm,
chain_type="stuff",
vectorstore=doc_store,
return_source_documents=False,
chain_type_kwargs={"prompt": custom_prompt_template},
)
random.seed(41)
for question in random.choices(questions, k=5):
print(">", question)
print(custom_qa.run(question), end="\n\n")
> what was uncle jesse's original last name on full house
Uncle Jesse's original last name on Full House was Cochran.
> the first civil service commission in india was set up on the basis of recommendation of
The first Civil Service Commission in India was not set up on the basis of a recommendation.
This notebook is centered around the integration of the Google Places API and custom user
profiles to enhance location-based searches. Our approach involves using the Google Places API
in combination with user preferences, aiming to make location discovery more personal and
relevant. Please note that while we focus on the Google Places API in this instance, there are
numerous other APIs you could explore and apply in a similar fashion.
Customer profile: This mock profile captures individual preferences for types of places (e.g.,
restaurants, parks, museums), budget, preferred ratings, and other specific requirements.
Google Places API: This API provides real-time data about nearby places. It factors in
various data points such as ratings, types of venues, costs, and more from the locations
around you.
Function calling: A single command such as "I'm hungry" or "I want to visit a museum"
activates the function which combines the user profile data and Google Places API to
identify suitable venues.
Profile-based recommendations: Learn how to create a user profile and make place
recommendations based on individual preferences.
API integration with function calling: Understand how to integrate and call Google Places
API effectively to source real-time data of various places using function calling.
Please note that while this system is highly versatile, its effectiveness may vary based on user
preferences and available place data. For the purposes of this notebook, the customer data is
fake and the location is hardcoded.
Setup
Google Account: If you don't already have one, you will need to create a Google account.
Google Places API Key: The API key is a unique identifier that is used to authenticate
requests associated with your project for usage and billing purposes. You can get your API
key from the Google Cloud Console.
Please note that Google Places API is a paid service, and the cost is associated with the number
Cookbook
of API About
calls made. Keep track of your usage to avoid API Docs
any unexpected Contribute
charges.
The requests library is also needed, you can download it by using the following command:
import json
from openai import OpenAI
import os
import requests
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY", "<your OpenAI API key if not set as env var>
This function simulates an API call that fetches user data from a database. For this demo, we're
using hard-coded data. The user profile contains various details such as the user's location (set
to the coordinates of the Golden Gate Bridge for this example), preferences in food and
activities, app usage metrics, recent interactions, and user rank.
In a production environment, you would replace this hard-coded data with a real API call to
your user database.
def fetch_customer_profile(user_id):
# You can replace this with a real API call in the production code
if user_id == "user1234":
return {
"name": "John Doe",
"location": {
"latitude": 37.7955,
"longitude": -122.4026,
},
"preferences": {
"food": ["Italian", "Sushi"],
"activities": ["Hiking", "Reading"],
},
"behavioral_metrics": {
"app_usage": {
"daily": 2, # hours
"weekly": 14 # hours
},
"favourite_post_categories": ["Nature", "Food", "Books"],
"active_time": "Evening",
},
"recent_searches": ["Italian restaurants nearby", "Book clubs"],
"recent_interactions": ["Liked a post about 'Best Pizzas in New York'", "Commented on a p
"user_rank": "Gold", # based on some internal ranking system
}
else:
return None
The function call_google_places_api serves to request information from the Google Places API
and provide a list of the top two places based on a given place_type and optional
food_preference. We've limited this function to the top two results to manage usage since this is
a paid service. However, you can modify this to retrieve any number of results as per your
requirement.
The function is configured with a hardcoded location (set to the coordinates of the
Transamerica Pyramid), your Google API key, and specific request parameters. Depending on
the place_type, it formulates the appropriate API request URL. If the place_type is a restaurant
and a food_preference is specified, it is included in the API request.
After sending the GET request, the function checks the response status. If it's successful, it
processes the JSON response, extracts the relevant details using the get_place_details function,
and returns them in a human-readable format. If the request fails, it prints out the error for
debugging.
The get_place_details function is used to retrieve more detailed information about a place, given
its place_id. It sends a GET request to the Google Place Details API and returns the result if the
request is successful. If the request fails, it prints out the error for debugging.
Both functions handle exceptions and return an error message if something goes wrong.
# If the place_type is restaurant and food_preference is not None, include it in the API requ
if place_type == 'restaurant' and food_preference:
URL = f"https://maps.googleapis.com/maps/api/place/nearbysearch/json?location={LOCATION}&
else:
URL = f"https://maps.googleapis.com/maps/api/place/nearbysearch/json?location={LOCATION}&
response = requests.get(URL)
if response.status_code == 200:
results = json.loads(response.content)["results"]
places = []
for place in results[:2]: # limit to top 2 results
place_id = place.get("place_id")
place_details = get_place_details(place_id, API_KEY) # Get the details of the place
places.append(place_info)
return places
else:
print(f"Google Places API request failed with status code {response.status_code}")
print(f"Response content: {response.content}") # print out the response content for debu
return []
except Exception as e:
print(f"Error during the Google Places API call: {e}")
return []
First, it fetches the customer's profile using their user_id . If no profile is found, it returns an
error message.
With a valid profile, it extracts the customer's food preferences and then interacts with the
OpenAI model. It provides an initial system message, giving context to the AI model about its
role, user preferences, and the usage of the Google Places API function.
The user input is also sent to the model as a message, and the function
call_google_places_api is defined in the functions parameter for the AI model to call as
needed.
Finally, it processes the model's response. If the model makes a function call to the Google
Places API, the function is executed with the appropriate arguments, and the names of nearby
places are returned. If there are no such places or the request isn't understood, appropriate
error messages are returned.
customer_profile_str = json.dumps(customer_profile)
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{
"role": "system",
"content": f"You are a sophisticated AI assistant, a specialist in user intent detection and
},
{"role": "user", "content": user_input}
],
temperature=0,
tools=[
{
"type": "function",
"function" : {
"name": "call_google_places_api",
"description": "This function calls the Google Places API to find the top places
"parameters": {
"type": "object",
"properties": {
"place_type": {
"type": "string",
"description": "The type of place to search for."
}
}
},
"result": {
"type": "array",
"items": {
"type": "string"
}
}
}
}
],
)
print(response.choices[0].message.tool_calls)
if response.choices[0].finish_reason=='tool_calls':
function_call = response.choices[0].message.tool_calls[0].function
if function_call.name == "call_google_places_api":
place_type = json.loads(function_call.arguments)["place_type"]
places = call_google_places_api(user_id, place_type, food_preference)
if places: # If the list of places is not empty
return f"Here are some places you might be interested in: {' '.join(places)}"
else:
return "I couldn't find any places of interest nearby."
Upon execution, the function fetches the user's profile, interacts with the AI model, processes
the model's response, calls the Google Places API if necessary, and ultimately returns a list of
recommendations tailored to the user's preferences and location. The printed output would
consist of these personalized recommendations.
user_id = "user1234"
user_input = "I'm hungry"
output = provide_user_specific_recommendations(user_input, user_id)
print(output)
[ChatCompletionMessageToolCall(id='call_Q1mXIi7D6GhobfE4tkruX7nB', function=Function(arguments=
Here are some places you might be interested in: Sotto Mare is a restaurant located at 552 Gree
Azure functions example
Krista Pratico
Open in Github
Jul 20, 2023
This notebook shows how to use the function calling capability with the Azure OpenAI service.
Functions allow a caller of chat completions to define capabilities that the model can use to
extend its functionality into external tools and data sources.
You can read more about chat functions on OpenAI's blog: https://openai.com/blog/function-
calling-and-other-api-updates
NOTE: Chat functions require model versions beginning with gpt-4 and gpt-35-turbo's -0613
labels. They are not supported by older versions of the models.
Setup
First, we install the necessary dependencies and import the libraries we will be using.
import os
import openai
import dotenv
dotenv.load_dotenv()
Authentication
The Azure OpenAI service supports multiple authentication mechanisms that include API keys
and Azure Active Directory token credentials.
use_azure_active_directory = False # Set this flag to True if you are using Azure Active Directory
To set up the OpenAI SDK to use an Azure API Key, we need to set api_key to a key associated
Cookbook About API Docs Contribute
with your endpoint (you can find this key in "Keys and Endpoints" under "Resource Management"
in the Azure Portal). You'll also find the endpoint for your resource here.
if not use_azure_active_directory:
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_API_KEY"]
client = openai.AzureOpenAI(
azure_endpoint=endpoint,
api_key=api_key,
api_version="2023-09-01-preview"
)
Let's now see how we can autheticate via Azure Active Directory. We'll start by installing the
azure-identity library. This library will provide the token credentials we need to authenticate
and help us build a token credential provider through the get_bearer_token_provider helper
function. It's recommended to use get_bearer_token_provider over providing a static token to
AzureOpenAI because this API will automatically cache and refresh tokens for you.
For more information on how to set up Azure Active Directory authentication with Azure
OpenAI, see the documentation.
if use_azure_active_directory:
endpoint = os.environ["AZURE_OPENAI_ENDPOINT"]
api_key = os.environ["AZURE_OPENAI_API_KEY"]
client = openai.AzureOpenAI(
azure_endpoint=endpoint,
azure_ad_token_provider=get_bearer_token_provider(DefaultAzureCredential(), "https://cognitiv
api_version="2023-09-01-preview"
)
“Note: the AzureOpenAI infers the following arguments from their corresponding
environment variables if they are not provided:”
Deployments
In this section we are going to create a deployment of a GPT model that we can use to call
functions.
Let's deploy a model to use with chat completions. Go to https://portal.azure.com, find your
Azure OpenAI resource, and then navigate to the Azure OpenAI Studio. Click on the
"Deployments" tab and then create a deployment for the model you want to use for chat
completions. The deployment name that you give the model will be used in the code below.
deployment = "" # Fill in the deployment name from the portal here
Functions
With setup and authentication complete, you can now use functions with the Azure OpenAI
service. This will be split into a few steps:
A list of functions can be defined, each containing the name of the function, an optional
description, and the parameters the function accepts (described as a JSON schema).
functions = [
{
"name": "get_current_weather",
"description": "Get the current weather",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA",
},
"format": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "The temperature unit to use. Infer this from the users location."
},
},
"required": ["location"],
},
}
]
Now we can pass the function into the chat completions API. If the model determines it should
call the function, a finish_reason of "tool_calls" will be populated on the choice and the
details of which function to call and its arguments will be present in the message . Optionally,
you can set the tool_choice keyword argument to force the model to call a particular function
(e.g. {"type": "function", "function": {"name": get_current_weather}} ). By default, this is
set to auto , allowing the model to choose whether to call the function or not.
messages = [
{"role": "system", "content": "Don't make assumptions about what values to plug into functions. A
{"role": "user", "content": "What's the weather like today in Seattle?"}
]
chat_completion = client.chat.completions.create(
model=deployment,
messages=messages,
tools=functions,
)
print(chat_completion)
The name of the function call will be one that was provided initially and the arguments will
include JSON matching the schema included in the function definition.
import json
def get_current_weather(request):
"""
This function is for illustrative purposes.
The location and unit should be used to determine weather
instead of returning a hardcoded response.
"""
location = request.get("location")
unit = request.get("unit")
return {"temperature": "22", "unit": "celsius", "description": "Sunny"}
function_call = chat_completion.choices[0].message.tool_calls[0].function
print(function_call.name)
print(function_call.arguments)
if function_call.name == "get_current_weather":
response = get_current_weather(json.loads(function_call.arguments))
The response from the function should be serialized into a new message with the role set to
"function". Now the model will use the response data to formulate its answer.
messages.append(
{
"role": "function",
"name": "get_current_weather",
"content": json.dumps(response)
}
)
function_completion = client.chat.completions.create(
model=deployment,
messages=messages,
tools=functions,
)
print(function_completion.choices[0].message.content.strip())
Semantic search using Elasticsearch and OpenAI
Liam Thompson
Open in Github
Aug 28, 2023
Open in Colab
Perform semantic search on the Elasticsearch index using the encoded question
# install packages
# import modules
Connect to Elasticsearch
ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't
already have an Elastic deployment, you can sign up for a free Elastic Cloud trial.
To connect to Elasticsearch, you need to create a client instance with the Cloud ID and password
for your deployment.
In this step we download the OpenAI Wikipedia embeddings dataset, and extract the zip file.
embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
wget.download(embeddings_url)
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip",
"r") as zip_ref:
zip_ref.extractall("data")
Next we use the Pandas library to read the unzipped CSV file into a DataFrame. This step makes
it easier to index the data into Elasticsearch in bulk.
wikipedia_dataframe = pd.read_csv("data/vector_database_wikipedia_articles_embedded.csv")
Now we need to create an Elasticsearch index with the necessary mappings. This will enable us
to index the data into Elasticsearch.
We use the dense_vector field type for the title_vector and content_vector fields. This is a
special field type that allows us to store dense vectors in Elasticsearch.
Later, we'll need to target the dense_vector field for kNN search.
index_mapping= {
"properties": {
"title_vector": {
"type": "dense_vector",
"dims": 1536,
"index": "true",
"similarity": "cosine"
},
"content_vector": {
"type": "dense_vector",
"dims": 1536,
"index": "true",
"similarity": "cosine"
},
"text": {"type": "text"},
"title": {"type": "text"},
"url": { "type": "keyword"},
"vector_id": {"type": "long"}
}
}
client.indices.create(index="wikipedia_vector_index", mappings=index_mapping)
The following function generates the required bulk actions that can be passed to Elasticsearch's
Bulk API, so we can index multiple documents efficiently in a single request.
For each row in the DataFrame, the function yields a dictionary representing a single document
to be indexed.
def dataframe_to_bulk_actions(df):
for index, row in df.iterrows():
yield {
"_index": 'wikipedia_vector_index',
"_id": row['id'],
"_source": {
'url' : row["url"],
'title' : row["title"],
'text' : row["text"],
'title_vector' : json.loads(row["title_vector"]),
'content_vector' : json.loads(row["content_vector"]),
'vector_id' : row["vector_id"]
}
}
As the dataframe is large, we will index data in batches of 100 . We index the data into
Elasticsearch using the Python client's helpers for the bulk API.
start = 0
end = len(wikipedia_dataframe)
batch_size = 100
for batch_start in range(start, end, batch_size):
batch_end = min(batch_start + batch_size, end)
batch_dataframe = wikipedia_dataframe.iloc[batch_start:batch_end]
actions = dataframe_to_bulk_actions(batch_dataframe)
helpers.bulk(client, actions)
print(client.search(index="wikipedia_vector_index", body={
"_source": {
"excludes": ["title_vector", "content_vector"]
},
"query": {
"match": {
"text": {
"query": "Hummingbird"
}
}
}
}))
To perform semantic search, we need to encode queries with the same embedding model used
to encode the documents at index time. In this example, we need to use the text-embedding-3-
small model.
# Define model
EMBEDDING_MODEL = "text-embedding-3-small"
# Define question
question = 'Is the Atlantic the biggest ocean in the world?'
# Create embedding
question_embedding = openai.Embedding.create(input=question, model=EMBEDDING_MODEL)
Now we're ready to run queries against our Elasticsearch index using our encoded question.
We'll be doing a k-nearest neighbors search, using the Elasticsearch kNN query option.
def pretty_response(response):
for hit in response['hits']['hits']:
id = hit['_id']
score = hit['_score']
title = hit['_source']['title']
text = hit['_source']['text']
pretty_output = (f"\nID: {id}\nTitle: {title}\nSummary: {text}\nScore: {score}")
print(pretty_output)
response = client.search(
index = "wikipedia_vector_index",
knn={
"field": "content_vector",
"query_vector": question_embedding["data"][0]["embedding"],
"k": 10,
"num_candidates": 100
}
)
pretty_response(response)
ID: 1936
Title: Atlantic Ocean
Summary: The Atlantic Ocean is the world's second largest ocean. It covers a total area of abo
Geologic history
The Atlantic formed when the Americas moved west from Eurasia and Africa. This began sometime i
The east coast of South America is shaped somewhat like the west coast of Africa, and this gave
Geography
The Atlantic Ocean is bounded on the west by North and South America. It connects to the Arctic
In the southeast, the Atlantic merges into the Indian Ocean. The 20° East meridian defines its
In the southwest, the Drake Passage connects it to the Pacific Ocean. The Panama Canal links th
The Atlantic Ocean is second in size to the Pacific. It occupies an area of about . The volume
The average depth of the Atlantic, along with its adjacent seas, is . The greatest depth is Mil
Gulf Stream
The Atlantic Ocean has important ocean currents. One of these, called the Gulf Stream, flows a
There are currents in the South Atlantic too, but the shape of this sea means that it has less
Geology
The main feature of the Atlantic Ocean's seabed is a large underwater mountain chain called the
Next steps
Success! Now you know how to use Elasticsearch as a vector database to store embeddings,
encode queries by calling the OpenAI embeddings endpoint, and run semantic search.
Play around with different queries, and if you want to try with your own data, you can
experiment with different embedding models.
ℹ️ Check out our other notebook Retrieval augmented generation using Elasticsearch and
OpenAI. That notebook builds on this example to demonstrate how to use Elasticsearch
together with the OpenAI chat completions API for retrieval augmented generation (RAG).
Visualizing embeddings in Atlas
Andriy Mulyar, Tomas Dulka
Open in Github
Mar 27, 2023
In this example, we will upload food review embeddings to Atlas to visualize the embeddings.
What is Atlas?
Atlas is a machine learning tool used to visualize massive datasets of embeddings in your web
browser. Upload millions of embeddings to Atlas and interact with them in your web browser or
jupyter notebook.
1. Login to Atlas.
import pandas as pd
import numpy as np
from ast import literal_eval
import nomic
from nomic import atlas
nomic.login('7xDPkYXSYDc1_ErdTPIcoAR9RNd8YDlkS3nVNXcVoIMZ6') #demo account
data = df.to_dict('records')
project = atlas.map_embeddings(embeddings=embeddings, data=data,
id_field='id',
colorable_fields=['Score'])
map = project.maps[0]
map
Project: meek-laborer
Sign Up
meek-laborer
Log In
Cookbook About API Docs Contribute
This notebook takes you through a simple flow to download some data, embed it, and then
index and search it using a selection of vector databases. This is a common requirement for
customers who want to store and search our embeddings with their own data in a secure
environment to support production use cases such as chatbots, topic modelling and more.
Vector databases enable enterprises to take many of the embeddings use cases we've shared in
this repo (question and answering, chatbot and recommendation services, for example), and
make use of them in a secure, scalable environment. Many of our customers make embeddings
solve their problems at small scale but performance and security hold them back from going
into production - we see vector databases as a key component in solving that, and in this guide
we'll walk through the basics of embedding text data, storing it in a vector database and using it
for semantic search.
Demo Flow
The demo flow is:
Setup: Import packages and set any required variables
Typesense
Setup: Set up the Typesense Python client. For more details go here
Index Data: We'll create a collection and index it for both titles and content.
Search Data: Run a few example queries with various goals in mind.
Once you've run through this notebook you should have a basic understanding of how to setup
and use vector databases, and can move on to more complex use cases making use of our
embeddings.
Setup
Import the required libraries and set the embedding model that we'd like to use.
import openai
# I've set this to our new embeddings model, this can be changed to the embedding model of your choic
EMBEDDING_MODEL = "text-embedding-3-small"
# Ignore unclosed SSL socket warnings - optional in case you get these errors
import warnings
In this section we'll load embedded data that we've prepared previous to this session.
embeddings_url = 'https://cdn.openai.com/API/examples/data/vector_database_wikipedia_articles_embedde
import zipfile
with zipfile.ZipFile("vector_database_wikipedia_articles_embedded.zip","r") as zip_ref:
zip_ref.extractall("../data")
article_df = pd.read_csv('../data/vector_database_wikipedia_articles_embedded.csv')
article_df.head()
article_df.info(show_counts=True)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25000 entries, 0 to 24999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 25000 non-null int64
1 url 25000 non-null object
2 title 25000 non-null object
3 text 25000 non-null object
4 title_vector 25000 non-null object
5 content_vector 25000 non-null object
6 vector_id 25000 non-null object
dtypes: int64(1), object(6)
memory usage: 1.3+ MB
Typesense
The next vector store we'll look at is Typesense, which is an open source, in-memory search
engine, that you can either self-host or run on Typesense Cloud.
Typesense focuses on performance by storing the entire index in RAM (with a backup on disk)
and also focuses on providing an out-of-the-box developer experience by simplifying available
options and setting good defaults. It also lets you combine attribute-based filtering together
with vector queries.
For this example, we will set up a local docker-based Typesense server, index our vectors in
Typesense and then do some nearest-neighbor search queries. If you use Typesense Cloud, you
can skip the docker setup part and just obtain the hostname and API keys from your cluster
dashboard.
Setup
To run Typesense locally, you'll need Docker. Following the instructions contained in the
Typesense documentation here, we created an example docker-compose.yml file in this repo
saved at ./typesense/docker-compose.yml.
After starting Docker, you can start Typesense locally by navigating to the
examples/vector_databases/typesense/ directory and running docker-compose up -d .
The default API key is set to xyz in the Docker compose file, and the default Typesense port to
8108 .
import typesense
typesense_client = \
typesense.Client({
"nodes": [{
"host": "localhost", # For Typesense Cloud use xxx.a1.typesense.net
"port": "8108", # For Typesense Cloud use 443
"protocol": "http" # For Typesense Cloud use https
}],
"api_key": "xyz",
"connection_timeout_seconds": 60
})
Index data
To index vectors in Typesense, we'll first create a Collection (which is a collection of Documents)
and turn on vector indexing for a particular field. You can even store multiple vector fields in a
single document.
schema = {
"name": "wikipedia_articles",
"fields": [
{
"name": "content_vector",
"type": "float[]",
"num_dim": len(article_df['content_vector'][0])
},
{
"name": "title_vector",
"type": "float[]",
"num_dim": len(article_df['title_vector'][0])
}
]
}
create_response = typesense_client.collections.create(schema)
print(create_response)
document_counter = 0
documents_batch = []
# Notice how you can add any fields that you haven't added to the schema to the document.
# These will be stored on disk and returned when the document is a hit.
# This is useful to store attributes required for display purposes.
document = {
"title_vector": v["title_vector"],
"content_vector": v["content_vector"],
"title": v["title"],
"content": v["text"],
}
documents_batch.append(document)
document_counter = document_counter + 1
documents_batch = []
print(f"Processed {document_counter} / {len(article_df)} ")
collection = typesense_client.collections['wikipedia_articles'].retrieve()
print(f'Collection has {collection["num_documents"]} documents')
Search Data
Now that we've imported the vectors into Typesense, we can do a nearest neighbor search on
the title_vector or content_vector field.
typesense_results = typesense_client.multi_search.perform({
"searches": [{
"q": "*",
"collection": "wikipedia_articles",
"vector_query": f"{field}_vector:([{','.join(str(v) for v in embedded_query)}], k:{top_k}
}]
}, {})
return typesense_results
Thanks for following along, you're now equipped to set up your own vector databases and use
embeddings to do all kinds of cool things - enjoy! For more complex use cases please continue
to work through other cookbook examples in this repo.
openai / openai-cookbook Public
cookbook.openai.com
MIT license
Star Notifications
✨ Navigate at cookbook.openai.com
Example code and guides for accomplishing common tasks with the OpenAI API. To run these
examples, you'll need an OpenAI account and associated API key (create a free account here).
Most code examples are written in Python, though the concepts can be applied in any language.
For other useful tools, guides and courses, check out these related resources from around the web.
Contributing
The OpenAI Cookbook is a community-driven resource. Whether you're submitting an idea, fixing a
typo, adding a new guide, or improving an existing one, your contributions are greatly appreciated!
Before contributing, read through the existing issues and pull requests to see if someone else is
already working on something similar. That way you can avoid duplicating efforts.
If there are examples or guides you'd like to see, feel free to suggest them on the issues page.
If you'd like to contribute new content, make sure to read through our contribution guidelines. We
welcome high-quality submissions of new examples and guides, as long as they meet our criteria and
fit within the scope of the cookbook.
The contents of this repo are automatically rendered into cookbook.openai.com based on
registry.yaml.
Contributors 193
+ 179 contributors
Languages
MDX 100.0%