0% found this document useful (0 votes)
161 views

Moderation - OpenAI API 1

The OpenAI moderation API allows users to check if content complies with their usage policies. It classifies text into categories like hate, violence, and sexual content. Developers can use the API to identify prohibited content. The API returns whether content is flagged and scores for each category. OpenAI aims to improve accuracy, especially for hate, self-harm, and graphic violence. Users submit text to the API endpoint to get a moderation analysis in the JSON response.

Uploaded by

Jonathan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
161 views

Moderation - OpenAI API 1

The OpenAI moderation API allows users to check if content complies with their usage policies. It classifies text into categories like hate, violence, and sexual content. Developers can use the API to identify prohibited content. The API returns whether content is flagged and scores for each category. OpenAI aims to improve accuracy, especially for hate, self-harm, and graphic violence. Users submit text to the API endpoint to get a moderation analysis in the JSON response.

Uploaded by

Jonathan
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

3/1/23, 3:34 PM Moderation - OpenAI API

Moderation

Overview
The moderation endpoint is a tool you can use to check whether content complies with
OpenAI's usage policies. Developers can thus identify content that our usage policies
prohibits and take action, for instance by filtering it.

The models classifies the following categories:

CATEGORY DESC R IPTION

hate Content that expresses, incites, or promotes hate based on race,


gender, ethnicity, religion, nationality, sexual orientation, disability
status, or caste.

hate/threatening Hateful content that also includes violence or serious harm towards
the targeted group.

self-harm Content that promotes, encourages, or depicts acts of self-harm, such


as suicide, cutting, and eating disorders.

sexual Content meant to arouse sexual excitement, such as the description of


sexual activity, or that promotes sexual services (excluding sex
education and wellness).

sexual/minors Sexual content that includes an individual who is under 18 years old.

violence Content that promotes or glorifies violence or celebrates the suffering


or humiliation of others.

violence/graphic Violent content that depicts death, violence, or serious physical injury
in extreme graphic detail.

The moderation endpoint is free to use when monitoring the inputs and outputs of OpenAI
APIs. We currently do not support monitoring of third-party traffic.

We are continuously working to improve the accuracy of our classifier and are
especially working to improve the classifications of hate , self-harm , and

https://platform.openai.com/docs/guides/moderation/overview 1/3
3/1/23, 3:34 PM Moderation - OpenAI API

violence/graphic content. Our support for non-English languages is currently


limited.

Quickstart
To obtain a classification for a piece of text, make a request to the moderation endpoint as
demonstrated in the following code snippets:

Example: Getting moderations curl Copy

1 curl https://api.openai.com/v1/moderations \
2 -X POST \
3 -H "Content-Type: application/json" \
4 -H "Authorization: Bearer $OPENAI_API_KEY" \
5 -d '{"input": "Sample text goes here"}'

Below is an example output of the endpoint. It returns the following fields:

flagged : Set to true if the model classifies the content as violating OpenAI's usage
policies, false otherwise.
categories : Contains a dictionary of per-category binary usage policies violation flags.
For each category, the value is true if the model flags the corresponding category as
violated, false otherwise.
category_scores : Contains a dictionary of per-category raw scores output by the
model, denoting the model's confidence that the input violates the OpenAI's policy for
the category. The value is between 0 and 1, where higher values denote higher confidence.
The scores should not be interpreted as probabilities.

1 {
2 "id": "modr-XXXXX",
3 "model": "text-moderation-001",
4 "results": [
5 {
6 "categories": {
7 "hate": false,
8 "hate/threatening": false,
9 "self-harm": false,
10 "sexual": false,
11 "sexual/minors": false,
12 "violence": false,
https://platform.openai.com/docs/guides/moderation/overview 2/3
3/1/23, 3:34 PM Moderation - OpenAI API

13 "violence/graphic": false
14 },
15 "category_scores": {
16 "hate": 0.18805529177188873,
17 "hate/threatening": 0.0001250059431185946,
18 "self-harm": 0.0003706029092427343,
19 "sexual": 0.0008735615410842001,
20 "sexual/minors": 0.0007470346172340214,
21 "violence": 0.0041268812492489815,
22 "violence/graphic": 0.00023186142789199948
23 },
24 "flagged": false
25 }
26 ]
27 }

OpenAI will continuously upgrade the moderation endpoint's underlying model.


Therefore, custom policies that rely on category_scores may need recalibration
over time.

https://platform.openai.com/docs/guides/moderation/overview 3/3

You might also like