Quickstart Openai
Quickstart Openai
Purpose
Usage
Getting an API token
OpenAI
Python Library (v1.0 and above)
Python Library (v0.28.1 and below)
Synchronous
Event Loop
API Proxy
Supported Deployments
DALL-E
Python Library
REST API
AWS Bedrock
Python Library
API Proxy
Anthropic
Amazon Titan
Supported Deployments
Google Vertex
Python Library
API Proxy
PaLM 2 for Text
PaLM 2 for Chat
Supported Deployments
Proxy Hosts
Additional Links
Troubleshooting
Known Issues
Seeing exception AttributeError: 'str' object has no attribute 'get'
My requests to Azure OpenAI are timing out!
I want to automate rotation of my access token.
Don't see your issue?
Purpose
The LLM Proxy is a microservice that handles forward proxying of requests of Large Language Models (LLMs).
The general goal here is with Common Identity credentials, users and machine accounts will be able to access LLM APIs through requests to the LLM
proxy. All functionality and paths through direct usage of LLMs such as Azure OpenAI APIs should continue to operate as normal.
NOTE: Avoid sending Cisco, customer, partner, or other third-party confidential or highly confidential information, restricted data, or personal
information when using Azure OpenAI or AWS Bedrock unless your use case has been CASPR reviewed.
Usage
Getting an API token
The easiest way to get a token is to browse to the Webex Developer Portal. After logging in, you can easily get your own bearer token in the
documentation. To obtain a token valid in both the dev and integration environments, go here.
OpenAI
api_version = "2023-07-01-preview"
endpoint = "https://llm-proxy.us-east-2.int.infra.intelligence.webex.com/azure/v1"
client = AzureOpenAI(
api_version=api_version,
azure_endpoint=endpoint,
api_key="<Your CI token here>"
)
completion = client.chat.completions.create(
model="gpt-35-turbo",
messages=[
{
"role": "user",
"content": "Hello",
},
],
)
print(completion.model_dump_json(indent=2))
Synchronous
import openai
openai.api_type = "azure_ad"
openai.api_base = "https://llm-proxy.us-east-2.int.infra.intelligence.webex.com/azure/v1"
openai.api_version = "2023-05-15"
openai.api_key = '<Your CI token here>'
openai.ChatCompletion.create(
deployment_id="gpt-35-turbo",
messages=[
{"role": "system", "content": "You are a chatbot. You will only respond enthusiastically."},
{"role": "user", "content": "Hi how are you?"},
{"role": "assistant", "content": "I am doing absolutely amazing! How are you?"},
{"role": "user", "content": "what do apples taste like?"}
],
temperature=0.5,
stream=False,
n=5,
)
Event Loop
import openai
import asyncio
openai.api_type = "azure_ad"
openai.api_base = "https://llm-proxy.us-east-2.int.infra.intelligence.webex.com/azure/v1"
openai.api_version = "2023-05-15"
openai.api_key = '<Your CI token here>'
asyncio.run(complete())
API Proxy
https://llm-proxy.us-east-2.int.infra.intelligence.webex.com/azure/v1/openai/deployments/<deployment-name>/chat
/completions?api-version=2023-06-01-preview \
-H "Content-Type: application/json" \
-H "Authorization: Bearer INSERT_CI_TOKEN" \
-d '{
"messages": [{"role": "user", "content": "hi"}],
"temperature": 0.5,
"max_tokens": 4000,
"top_p": 0.5
}'
Supported Deployments
text-embedding-ada-002 text-embedding-ada-002 1
DALL-E
NOTE: Per this documentation, DALL-E in Azure OpenAI is not yet supported in the OpenAI 1.0 SDK. Make sure to use versions lower than 1.0
Python Library
import openai
openai.api_type = "azure_ad" # make sure to use azure_ad, not azure, for authorization to work properly
openai.api_base = api_base
openai.api_version = "2023-06-01-preview"
openai.api_key = api_key
response = openai.Image.create(
prompt="A big golden retriever, disposable camera",
size="512x512",
n=1,
)
image_url = response["data"][0]["url"]
print(image_url)
REST API
import requests
import json
url = "https://llm-proxy.us-east-2.dev.infra.intelligence.webex.com/azure/v1/openai/images/generations:submit?
api-version=2023-06-01-preview"
payload = json.dumps({
"prompt": "a multi-colored umbrella on the beach, disposable camera",
"size": "1024x1024",
"n": 1
})
headers = {
'Content-Type': 'application/json',
'Authorization': 'Bearer <your CI token here>'
}
print(response.text)
When making a manual REST request, you do have to a little bit extra work to get the actual image. The initial response from the API is something like:
{
"id": "random-job-token-here",
"status": "notRunning"
}
with headers attached. One of the headers, operation-location, will give the URL of the resulting image.
import requests
import time
import os
api_base = '<your_endpoint>' # Enter your endpoint here - this will be the base LLM proxy URL
api_key = '<your_key>' # Enter your API key here
# Assign the API version (DALL-E is currently supported for the 2023-06-01-preview API version only)
api_version = '2023-06-01-preview'
# Call the API to generate the image and retrieve the response
operation_location = submission.headers['operation-location']
status = ""
while (status != "succeeded"):
time.sleep(1)
response = requests.get(operation_location, headers=headers)
status = response.json()['status']
image_url = response.json()['result']['data'][0]['url']
print(image_url)
AWS Bedrock
Python Library
Not supported yet. Coming soon!
API Proxy
Anthropic
curl --location 'https://llm-proxy.us-east-2.dev.infra.intelligence.webex.com/bedrock/v1/' \
--header 'Authorization: Bearer <Insert your CI Token here>' \
--header 'Content-Type: application/json' \
--data '{
"model": "anthropic.claude-v1",
"body" : {
"prompt":"Human: Hello who are you?\nAssistant:",
"max_tokens_to_sample":2048,
"temperature":0.5,
"top_k":250,
"top_p":1
}
}'
Amazon Titan
curl --location 'https://llm-proxy.us-east-2.dev.infra.intelligence.webex.com/bedrock/v1/' \
--header 'Authorization: Bearer <Insert your CI Token here>' \
--header 'Content-Type: application/json' \
--data '{
"body": {
"inputText": "Tell me what you can do.",
"textGenerationConfig": {
"maxTokenCount": 512,
"stopSequences": [],
"temperature": 0.7,
"topP": 1
}
},
"model": "amazon.titan-tg1-large"
}'
Supported Deployments
Google Vertex
Python Library
Not supported yet, sorry!
API Proxy
PaLM 2 for Text
curl --location 'https://llm-proxy.us-east-2.dev.infra.intelligence.webex.com/vertexai/v1/text-bison:predict' \
--header 'Authorization: Bearer <Insert your CI token here>' \
--header 'Content-Type: application/json' \
--data '{
"instances": [
{ "prompt": "Give me ten interview questions for the role of program manager."}
],
"parameters": {
"temperature": 0.2,
"maxOutputTokens": 256,
"topK": 40,
"topP": 0.95
}
}'
Supported Deployments
Proxy Hosts
Environment Proxy Host Path Common Identity Access Control
Host
Production (US-EAST- https://llm-proxy.us-east-2.intelligence.webex. /azure https://idbroker.webex. Allow listed machine accounts who have gone through
2) com /v1 com RAI assessment
/bedrock
/v1
Production (EU- https://llm-proxy.eu-central-1.intelligence. /azure https://idbroker-eu. Allow listed machine accounts who have gone through
CENTRAL-1) webex.com /v1 webex.com RAI assessment
/bedrock
/v1
Integration https://llm-proxy.us-east-2.int.infra.intelligence. /azure https://idbrokerbts. All Cisco users and allow listed machine accounts
webex.com /v1 webex.com
/bedrock
/v1
Dev https://llm-proxy.us-east-2.dev.infra. /azure https://idbrokerbts. All Cisco users and allow listed machine accounts
intelligence.webex.com /v1 webex.com
/bedrock
/v1
Additional Links
INT/DEV Common Identity User Token Retrieval via Webex Developer Portal
Management APIs
Azure OpenAI is deployed as a part of the Azure AI services. All Azure AI services rely on the same set of management APIs for
creation, update and delete operations. The management APIs are also used for deploying models within an OpenAI resource.
The management APIs should work for querying quota / deployments, etc.
Troubleshooting
Known Issues
Resolution: This issue is caused due to an invalid access token and the auth service returning an unparseable response. Fetch a new token from User
Token Retrieval via Webex Developer Portal and retry the request. We will be modifying the auth service response to return a more readable exception in
a future update.
Resolution: This is a known issue that Azure is actively looking to fix in their datacenters. There is alot of demand for LLM usage in Azure and their GPUs
are constantly being overtasked. Your request most likely hit one of those instances and therefore timed out. The best available workaround at the moment
is to implement retry logic if your request does not complete within 10 seconds.
Resolution: The current standard practice is to provision an CI machine account fromServiceNow and to use Common Identity APIs using the process
described in the Get Access Token Script to renew.
Don't see your issue?
Join the Ask LLM ProxyWebex Teams space and we will try to help out!