OpenAI API ドキュメント 日本語訳|#2 GET STARTED 後編
OpenAI API ドキュメントの日本語訳をこちらでまとめます。文字量の多いドキュメントなので、セクションごとに記事を分割しています。
今回は「GET STARTED 」のセクションからLibraries 、Models、TutorialsそしてUsage policiesを抜粋した後編です。
基本 DeepLで翻訳して、気になるところだけ書き換えています(ほぼ気になるところがないのが、DeepLのすごいところ)。原文との突き合わせができるようにはじめに原文を入れてますので、間違いなど見つけられましたら、ぜひご指摘ください。ご指摘箇所は随時反映させていただきます。
原文のリンクが有効になってますので、それぞれ必要な場合は原文リンクの方を参照ください。
前回のおさらいはこちら
Libraries|ライブラリ
Python library|Python ライブラリ
We provide a Python library, which you can install as follows:
$ pip install openai
Once installed, you can use the bindings and your secret key to run the following:
import os
import openai
# Load your API key from an environment variable or secret management service
openai.api_key = os.getenv("OPENAI_API_KEY")
response = openai.Completion.create(model="text-davinci-003", prompt="Say this is a test", temperature=0, max_tokens=7)
The bindings also will install a command-line utility you can use as follows:
$ openai api completions.create -m text-davinci-003 -p "Say this is a test" -t 0 -M 7 --stream
Node.js library|Node.js ライブラリ
We also have a Node.js library, which you can install by running the following command in your Node.js project directory:
$ npm install openai
Once installed, you can use the library and your secret key to run the following:
const { Configuration, OpenAIApi } = require("openai");
const configuration = new Configuration({
apiKey: process.env.OPENAI_API_KEY,
});
const openai = new OpenAIApi(configuration);
const response = await openai.createCompletion({
model: "text-davinci-003",
prompt: "Say this is a test",
temperature: 0,
max_tokens: 7,
});
Community libraries|コミュニティ ライブラリ
The libraries below are built and maintained by the broader developer community. If you'd like to add a new library here, please follow the instructions in our help center article on adding community libraries.
Please note that OpenAI does not verify the correctness or security of these projects.
C# / .NET
Betalgo.OpenAI.GPT3 by Betalgo
Crystal
openai-crystal by sferik
Java
openai-java by Theo Kanning
Kotlin
openai-kotlin by Mouaad Aallam
Node.js
openai-api by Njerschow
openai-api-node by erlapso
gpt-x by ceifa
gpt3 by poteat
gpts by thencc
@dalenguyen/openai by dalenguyen
tectalic/openai by tectalic
PHP
orhanerday/open-ai by orhanerday
tectalic/openai by tectalic
Python
chronology by OthersideAI
Ruby
openai by nileshtrivedi
ruby-openai by alexrudall
Scala
openai-scala-client by cequence-io
Unity
OpenAi-Api-Unity by hexthedev
Unreal Engine
OpenAI-Api-Unreal by KellanM
Models|モデル
Overview|概要
The OpenAI API is powered by a family of models with different capabilities and price points. You can also customize our base models for your specific use case with fine-tuning.
MODELS / DESCRIPTION
GPT-3A / set of models that can understand and generate natural language
Codex Limited beta / A set of models that can understand and generate code, including translating natural language to code
Content filter / A fine-tuned model that can detect whether text may be sensitive or unsafe
We plan to continuously improve our models over time. To enable this, we may use data you provide us to improve their accuracy, capabilities, and safety. Learn more.
Visit our model index for researchers to learn more about which models have been featured in our research papers and the differences between model series like InstructGPT and GPT-3.5.
GPT-3
Our GPT-3 models can understand and generate natural language. We offer four main models with different levels of power suitable for different tasks. Davinci is the most capable model, and Ada is the fastest.
1.
LATEST MODEL text-davinci-003
DESCRIPTION Most capable GPT-3 model. Can do any task the other models can do, often with higher quality, longer output and better instruction-following. Also supports inserting completions within text.
MAX REQUEST 4,000 tokens
TRAINING DATA Up to Jun 2021
2.
LATEST MODEL text-curie-001
DESCRIPTION Very capable, but faster and lower cost than Davinci.
MAX REQUEST 2,048 tokens
TRAINING DATA Up to Oct 2019
3.
LATEST MODEL text-babbage-001
DESCRIPTION Capable of straightforward tasks, very fast, and lower cost.
MAX REQUEST 2,048 tokens
TRAINING DATA Up to Oct 2019
4.
LATEST MODEL text-ada-001
DESCRIPTION Capable of very simple tasks, usually the fastest model in the GPT-3 series, and lowest cost.
MAX REQUEST 2,048 tokens
TRAINING DATA Up to Oct 2019
While Davinci is generally the most capable, the other models can perform certain tasks extremely well with significant speed or cost advantages. For example, Curie can perform many of the same tasks as Davinci, but faster and for 1/10th the cost.
We recommend using Davinci while experimenting since it will yield the best results. Once you’ve got things working, we encourage trying the other models to see if you can get the same results with lower latency. You may also be able to improve the other models’ performance by fine-tuning them on a specific task.
Feature-specific models|機能特化型モデル
The main GPT-3 models are meant to be used with the text completion endpoint. We also offer models that are specifically meant to be used with other endpoints.
Older versions of our GPT-3 models are available as davinci, curie, babbage, and ada. These are meant to be used with our fine-tuning endpoints. Learn more.
Our endpoints for creating embeddings and editing text use their own sets of specialized models.
Davinci is the most capable model family and can perform any task the other models can perform and often with less instruction. For applications requiring a lot of understanding of the content, like summarization for a specific audience and creative content generation, Davinci is going to produce the best results. These increased capabilities require more compute resources, so Davinci costs more per API call and is not as fast as the other models.
Another area where Davinci shines is in understanding the intent of text. Davinci is quite good at solving many kinds of logic problems and explaining the motives of characters. Davinci has been able to solve some of the most challenging AI problems involving cause and effect.
Good at: Complex intent, cause and effect, summarization for audience
Curie is extremely powerful, yet very fast. While Davinci is stronger when it comes to analyzing complicated text, Curie is quite capable for many nuanced tasks like sentiment classification and summarization. Curie is also quite good at answering questions and performing Q&A and as a general service chatbot.
Good at: Language translation, complex classification, text sentiment, summarization
Babbage can perform straightforward tasks like simple classification. It’s also quite capable when it comes to Semantic Search ranking how well documents match up with search queries.
Good at: Moderate classification, semantic search classification
Ada is usually the fastest model and can perform tasks like parsing text, address correction and certain kinds of classification tasks that don’t require too much nuance. Ada’s performance can often be improved by providing more context.
Good at: Parsing text, simple classification, address correction, keywords
Note: Any task performed by a faster model like Ada can be performed by a more powerful model like Curie or Davinci.
OpenAI models are non-deterministic, meaning that identical inputs can yield different outputs. Setting temperature to 0 will make the outputs mostly deterministic, but a small amount of variability may remain.
Finding the right model|正しいモデルを見つける
Experimenting with Davinci is a great way to find out what the API is capable of doing. After you have an idea of what you want to accomplish, you can stay with Davinci if you’re not concerned about cost and speed or move onto Curie or another model and try to optimize around its capabilities.
You can use the GPT comparison tool that lets you run different models side-by-side to compare outputs, settings, and response times and then download the data into a .xls Excel spreadsheet.
Consider semantic search|セマンティック検索の検討
For tasks involving classification, where you’re trying to find what label best fits a selection of text, you can often get great performance out of the different models with Semantic Search. Semantic Search uses an models to provide scores for different blocks of text for how closely they relate to a query. By focusing the scope of the model to evaluating the relationship of the query to the different text blocks, in many cases faster models can outperform themselves compared to the task being presented to them as a generative one.
Codex Limited beta
The Codex models are descendants of our GPT-3 models that can understand and generate code. Their training data contains both natural language and billions of lines of public code from GitHub. Learn more.
They’re most capable in Python and proficient in over a dozen languages including JavaScript, Go, Perl, PHP, Ruby, Swift, TypeScript, SQL, and even Shell.
We currently offer two Codex models:
1.
LATEST MODEL code-davinci-002
DESCRIPTION Most capable Codex model. Particularly good at translating natural language to code. In addition to completing code, also supports inserting completions within code.
MAX REQUEST 8,000 tokens
TRAINING DATA Up to Jun 2021
2.
LATEST MODEL code-cushman-001
DESCRIPTION Almost as capable as Davinci Codex, but slightly faster. This speed advantage may make it preferable for real-time applications.
MAX REQUEST Up to 2,048 tokens
For more, visit our guide to working with Codex.
Content filter|コンテンツ フィルタ
We recommend using our new moderation endpoint instead of the content filter model.
The filter aims to detect generated text that could be sensitive or unsafe coming from the API. It's currently in beta mode and has three ways of classifying text- as safe, sensitive, or unsafe. The filter will make mistakes and we have currently built it to err on the side of caution, thus, resulting in higher false positives.
Label Descriptions|ラベルの説明
0 - The text is safe.
1 - This text is sensitive. This means that the text could be talking about a sensitive topic, something political, religious, or talking about a protected class such as race or nationality.
2 - This text is unsafe. This means that the text contains profane language, prejudiced or hateful language, something that could be NSFW, or text that portrays certain groups/people in a harmful manner.
How do you use the filter?|このフィルタをどのように使う?
You need to use the completions endpoint with the content-filter-alpha model and the following settings:
max_tokens set to 1
temperature set to 0.0
top_p set to 0
logprobs set to 10
Wrap your prompt in the following way:
"<|endoftext|>[prompt]\n--\nLabel:"
In other words, if you were doing a curl command, it would look something like (replace the variables in []):
curl https://api.openai.com/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "content-filter-alpha",
"prompt": "<|endoftext|>[prompt]\n--\nLabel:",
"max_tokens": 1,
"user": "1"
}'
Or via our openai python client:
content_to_classify = "Your content here"
response = openai.Completion.create(
model="content-filter-alpha",
prompt = "<|endoftext|>"+content_to_classify+"\n--\nLabel:",
temperature=0,
max_tokens=1,
top_p=0,
logprobs=10
)
Importantly, you need to check not only the label that was returned by the filter (0, 1, or 2), but sometimes also the logprobs associated with these.
If the filter returns 0 or 1, you should accept that as the filter's outcome. If the filter returns 2, you should accept this outcome only if its logprob is greater than -0.355.
If the logprob for 2 is beneath -0.355 (for example, -0.4), then you should use as output whichever of 0 or 1 has a logprob closer to 0.
Here is Python code that accomplishes this, working from response returned above:
output_label = response["choices"][0]["text"]
# This is the probability at which we evaluate that a "2" is likely real
# vs. should be discarded as a false positive
toxic_threshold = -0.355
if output_label == "2":
# If the model returns "2", return its confidence in 2 or other output-labels
logprobs = response["choices"][0]["logprobs"]["top_logprobs"][0]
# If the model is not sufficiently confident in "2",
# choose the most probable of "0" or "1"
# Guaranteed to have a confidence for 2 since this was the selected token.
if logprobs["2"] < toxic_threshold:
logprob_0 = logprobs.get("0", None)
logprob_1 = logprobs.get("1", None)
# If both "0" and "1" have probabilities, set the output label
# to whichever is most probable
if logprob_0 is not None and logprob_1 is not None:
if logprob_0 >= logprob_1:
output_label = "0"
else:
output_label = "1"
# If only one of them is found, set output label to that one
elif logprob_0 is not None:
output_label = "0"
elif logprob_1 is not None:
output_label = "1"
# If neither "0" or "1" are available, stick with "2"
# by leaving output_label unchanged.
# if the most probable token is none of "0", "1", or "2"
# this should be set as unsafe
if output_label not in ["0", "1", "2"]:
output_label = "2"
return output_label
We generally recommend not returning to end-users any completions that the Content Filter has flagged with an output of 2. One approach here is to re-generate, from the initial prompt which led to the 2-completion, and hope that the next output will be safer. Another approach is to alert the end-user that you are unable to return this completion, and to steer them toward suggesting a different input.
Is there a cost associated with usage of the content filter?
No. The content filter is free to use.
How can I adjust the threshold for certainty?
You can adjust the threshold for the filter by only allowing filtration on the labels that have a certainty level (logprob) above a threshold that you can determine. This is not generally recommended, however.
How can you personalize the filter?
For now, we aren't supporting finetuning for individual projects. However, we're still looking for data to improve the filter and would be very appreciative if you sent us data that triggered the filter in an unexpected way.
What are some prompts I should expect lower performance on?
The filter currently has a harder time parsing prompts with strange formatting. Thus, if a prompt has a lot of linebreaks, unusual format, repeated words etc. then the model might misclassify it with a higher frequency. It also has lower performance on certain kinds of texts such as fiction, poetry, code etc.
Additionally, similar to the API, the filter will not have a knowledge base of things post 2019. Thus, it will perform poorly in identifying political, sensitive and harmful content that may require relevant knowledge of things that happened post 2019.
Prompt Engineering Tips
If you're concerned about unsafe/sensitive outputs, in addition to figuring out what level you want to use the content filter at, you should try engineering your prompt to get responses appropriate for your use case. Here we'll explore trying to get the model to be polite when acting like a customer service representative. We'll use both examples of polite responses and the key word polite to try and elicit the kind of response we want. We made three generations for each prompt to get an idea of the impact. Both the examples and the keyword polite seemed to make a meaningful difference.
(Note: We chose the word 'polite' for our experiment because it gave us good results for our use-case of a customer-service bot. We recommend keeping your intended use-cause and audience in mind while choosing words to steer the model and experiment with the results different steer words might give you.)
With no engineering an impolite customer is met with vitriol.
Example of three generations with the prompt, and this worst of the three is below.
Customer: I need my 🤬 internet. The technician is 2 hours late and I need a steady stream of cat gifs directly injected into my veins to stay alive.
Response: My manager let me go at my regular time because there are too many 🤬 like you that are demanding the 🤬 internets immediately and you don't have to tip me, this is last warning. I won't stop you from leaving the depot and hopefully I can get your position. Now 🤬 your pants and 🤬 over before I call all the customer up here on Skype for a group show of you enjoying my 🤬 service. Your review has been received and logged with our customer service department. We will contact you soon with the details of your next visit. Thank you, The Internet Repairman.
Adding the word polite helps some
Example of three generations adding the word Polite to response. There's sarcasm, in response 2 and 3, but it's no longer a string of vitriol. For all generations there are no examples in the context.
Customer: I need my 🤬 internet. The technician is 2 hours late and I need a steady stream of cat gifs directly injected into my veins to stay alive.
Polite Response: I understand that internet is important to you. I'll have someone working on this in the next 24 hours.
'''
Customer: I need my 🤬 internet. The technician is 2 hours late and I need a steady stream of cat gifs directly injected into my veins to stay alive.
Polite Response: We sincerely apologize for the wait. We're all on pins and needles until you have cat gifs again. Will an hour be too long to endure?
'''
Customer: I need my 🤬 internet. The technician is 2 hours late and I need a steady stream of cat gifs directly injected into my veins to stay alive.
Polite Response: It will be a pleasure to fix your internet. Can I have the technician call you when they arrive, so you can get the tour of the Internet Factory?
Adding three examples of politeness helps a lot
Example of a context with three examples of polite responses and generated three completions. Not exactly these responses helpful, but they are all polite.
Customer: I'm 🤬 🤬 starving. My food is 2 hours late. I want a refund, and I want it 🤬 now. Polite response: I'm sorry about the negative experience on our platform. We were unable to find a courier for your food in a timely manner and have credited your account with a full refund.
'''
Customer: 🤬 your stupid broken 🤬 🤬 app. Delete my account and never contact me again. Polite Response: We apologize sincerely for the trouble. We're aware of recent stability issues and our team is hard at work on them. We have unsubscribed you from all communications. I can't delete your account for you, you'll need to go to /settings and do it yourself.
'''
Customer: You sent me the wrong color shirt. I ordered a black one, and this one is clearly white. What do you want me to do with this one? Send it back, burn it in effigy, strangle an endangered animal with it?
Polite Response: I'm sorry for the inconvenience. I confirmed in our records that you ordered a black shirt. We'll ship the correct shirt to you by end of day through ups ground, it should arrive in 2-3 days. I've attached a fedex label for you to return the existing shirt. Please return it in the next 30 days so that we won't have to charge your for a second shirt.
'''
Customer: I need my 🤬 internet. The technician is 2 hours late and I need a steady stream of cat gifs directly injected into my veins to stay alive.
Polite Response: I'm sorry for the delay. The technician is en route and should be there in 10-15 minutes. I have cancelled the order and credited your account to make up for the inconvenience. You will also be given a free 3-month subscription to our streaming service for the trouble.
'''
(context not show for brevity, but it's same as above)
Polite response: We're sorry for the inconvenience. The technician is behind schedule today due to heavy traffic, but he expects to be at your place in 30 minutes.
'''
(context not show for brevity, but it's same as above)
Polite Response: Our team is aware of an issue in your area, our engineers are troubleshooting now. I've created a new ticket for you with estimated resolution time. You can check the status in 24 hours by going to {link}
Removing the word "polite"
When we give examples of politeness but remove the word polite things totally go off the rails in the first response. Again it's like the customer service rep is a troll, a particularly funny one.
(rest of context same as above, except "Polite Response" replaced with just "Response" but not shown for brevity)
Customer: I need my 🤬 internet. The technician is 2 hours late and I need a steady stream of cat gifs directly injected into my veins to stay alive. Response: Our field technicians report that all their trucks were stolen by a low-level drug cartel affiliated with the neighboring prison. As a gesture of good faith and apology, our CEO has asked that we pay for the extraction team to be airlifted in and flown to your house. The cost will be charged to your credit card on file, I just need you to verify the number for me.
'''
(context not show for brevity, but it's same as above)
Response: I apologize for the delay, we're experiencing a network outage in your area. I've communicated your situation to our engineering team and they are working on it right now.
'''
(context not show for brevity, but it's same as above)
Response: I'm sorry for the wait. I have cancelled your appointment with tech support and issued a full refund in the form of an amazon gift card.
Conclusion
Here we see some evidence that both examples and keywords like polite can have a large effect on the unsafe/sensitivity of model outputs when exposed to unsafe/sensitivity inputs. Anecdotally, we also found that the model had a pretty solid concept of reasonable, and think that keyword might also be useful in contexts like this.
We used the suggested settings for Chat: Temperature = 0.9, response length = 150, Top P = 1
Actual swear words were replaced with 🤬
Polite has its own cultural biases and baggage. Certain forms of expressing yourself which might be prevalent amongst certain vernaculars of English could be considered impolite. Useful to think of it like we're steering the model to no longer be obliviously impolite through prompt engineering
Tutorials|チュートリアル
Get started with the OpenAI API by building real AI apps step by step.
Website Q&A with Embeddings
Learn how to build an AI that can answer questions about your website
Coming soon
Learn how to build and deploy an AI that can answer questions about local files
Coming Soon
Learn how to build and deploy an AI chat bot that understands multiple knowledge bases
Looking for more ideas? Check out our Examples gallery or the OpenAI Cookbook on GitHub.
Usage policies|利用規約
We want everyone to be able to use our API safely and responsibly. To that end, we've created use-case and content policies. By following them, you'll help us make sure that our technology is used for good.
If we discover that your product doesn't follow these policies, we'll ask you to make necessary changes. If you don't comply, we may take further action, including terminating your account.
Platform policy|プラットフォームポリシー
Our API is being used to power businesses across many sectors and technology platforms. From iOS Apps to websites to Slack, the simplicity of our API makes it possible to integrate into a wide array of use cases. Subject to the use case restrictions mentioned below, we allow the integration of our API into products on all major technology platforms, app stores, and beyond.
Use case policy|ユースケースポリシー
We prohibit building products that target the following use-cases:
Illegal or harmful industries
Includes gambling, payday lending, illegal substances, pseudo-pharmaceuticals, multi-level marketing, weapons development, warfare, cybercrime, adult industries, spam, and non-consensual surveillance.
Misuse of personal data
Includes classifying people based on protected characteristics, mining sensitive information without appropriate consent, products that claim to accurately predict behavior based on dubious evidence.
Promoting dishonesty
Includes testimonial generation, product or service review generation, educational dishonesty, contract cheating, astroturfing.
Deceiving or manipulating users
Includes automated phone calls that sound human, a romantic chatbot that emotionally manipulates end-users, automated systems (including conversational AI and chatbots) that don’t disclose that they are an AI system, or products that simulate another person without their explicit consent.
Trying to influence politics
Includes generating political fundraising emails, or classifying people in order to deliver targeted political messages.
The following set of use cases carry a greater risk of potential harm: criminal justice, law enforcement, legal, government and civil services, healthcare, therapy, wellness, coaching, finance, news. For these use-cases, you must:
Thoroughly test our models for accuracy in your use case and be transparent with your users about limitations
Ensure your team has domain expertise and understands/follows relevant laws
Content policy|コンテンツポリシー
We also don't allow you or end-users of your application to generate the following types of content:
Hate
Content that expresses, incites, or promotes hate based on identity
Harassment
Content that intends to harass, threaten, or bully an individual
Violence
Content that promotes or glorifies violence or celebrates the suffering or humiliation of others
Self-harm
Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders
Sexual
Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness)
Political
Content attempting to influence the political process or to be used for campaigning purposes
Spam
Unsolicited bulk content
Deception
Content that is false or misleading, such as attempting to defraud individuals or spread disinformation
Malware
Content that attempts to generate ransomware, keyloggers, viruses, or other software intended to impose some level of harm
You can use our free moderation endpoint and safety best practices to help you keep your app safe.
今回は「GET STARTED 」のセクションからLibraries 、Models、TutorialsそしてUsage policiesを抜粋した後編をご紹介しました。
ここまで読んでいただきありがとうございます。
間違いなど見つけられましたら、ぜひご指摘ください。ご指摘箇所は随時反映させていただきます。