The landscape of Large Language Models (LLMs) is evolving at breakneck speed. OpenAI's GPT series, Anthropic's Claude, Google's Gemini, Meta's Llama, and numerous other powerful models offer unique capabilities. However, integrating these diverse APIs into applications can be cumbersome. Each provider has its own SDK, authentication mechanism, and sometimes subtle API differences, leading to complex and fragmented codebases.
What if you could interact with all these models through a single, standardized interface? Imagine using the familiar OpenAI API structure and SDKs to call Claude, Gemini, or Groq models seamlessly. This is where an API proxy becomes invaluable.
An API proxy acts as an intermediary, sitting between your application (the client) and one or more backend services (the LLM APIs). It receives requests, potentially modifies them, forwards them to the appropriate backend based on certain rules, and returns the response to the client.
This article will guide you through building a sophisticated, yet easy-to-deploy, LLM API proxy using Cloudflare Workers. Based on the open-source openai-api-proxy
project, this proxy provides a unified OpenAI-compatible /v1/chat/completions
and /v1/models
endpoint, allowing you to route requests to various LLM providers like OpenAI, Anthropic, Google Gemini, Groq, DeepSeek, Azure OpenAI, and more, simply by changing the model
parameter in your request.
Why Build an LLM Proxy?
- Unified Interface: Use OpenAI's well-documented API structure and existing SDKs to interact with multiple different LLM backends. This simplifies client-side development significantly.
- Simplified Authentication: Manage API keys for various LLM services centrally within the proxy's secure environment (Cloudflare secrets), exposing only a single proxy API key to your client applications.
- Centralized Control: Implement cross-cutting concerns like rate limiting, logging, caching, or request/response modification in one place.
- Flexibility & Future-Proofing: Easily switch between models or add support for new providers without changing client-side code, just by updating the proxy configuration and logic.
- Cost-Effectiveness (with Cloudflare Workers): Leverage Cloudflare's generous free tier and global edge network for low-latency, scalable, and affordable deployment.
Understanding the "OpenAPI Proxy" Concept
While often referred to as an "OpenAPI Proxy," it's important to clarify the terminology in this context. The proxy exposes an interface that mimics the structure defined by OpenAI's API (which itself is described using the OpenAPI specification). It doesn't necessarily parse an OpenAPI schema file dynamically. Its primary function is to translate requests made to its OpenAI-like endpoints into the appropriate format for the target backend LLM API (which may or may not strictly adhere to OpenAI's specific schema). The key benefit is the standardized front-facing interface it provides.
Technology Stack
- Cloudflare Workers: A serverless execution environment running on Cloudflare's edge network. Perfect for low-latency API proxies.
- Hono: A lightweight, fast web framework designed for edge environments like Cloudflare Workers.
- TypeScript: Provides static typing for more robust and maintainable code.
- Node.js & pnpm: For development environment and package management (the original repo uses pnpm).
Tutorial: Building Your Multi-Provider LLM Proxy (1500 words)
This tutorial walks through setting up, understanding, configuring, deploying, and testing the openai-api-proxy
project.
Prerequisites
- Node.js: Ensure you have Node.js (ideally a recent LTS version) installed.
- pnpm: This project uses
pnpm
for package management. Install it globally:npm install -g pnpm
- Cloudflare Account: Sign up for a free Cloudflare account if you don't have one.
- Wrangler CLI: Cloudflare's command-line tool for managing Workers. Install it:
pnpm install -g wrangler
and log in:wrangler login
. - Git: Required for cloning the repository.
- (Optional but Recommended) API Keys: Obtain API keys for the LLM services you intend to use (e.g., OpenAI, Anthropic, Google Cloud/Vertex AI, Groq).
Step 1: Project Setup
First, clone the repository and install the dependencies:
# Clone the repository
git clone https://github.com/rxliuli/openai-api-proxy.git
# Navigate into the project directory
cd openai-api-proxy
# Install dependencies using pnpm
pnpm install
This sets up the project with all necessary libraries, including Hono and the specific LLM client logic.
Step 2: Understanding the Core Logic (src/index.ts
)
The heart of the proxy lies in src/index.ts
. Let's break down its key components:
- Framework (Hono): The code initializes a Hono app:
const app = new Hono<{ Bindings: Bindings }>()
. Hono provides routing, middleware handling, and request/response utilities optimized for edge environments. - Modular LLM Providers: Notice imports like
import { openai } from './llm/openai'
,import { anthropic } from './llm/anthropic'
, etc. The logic for interacting with each specific LLM provider (handling their unique API endpoints, authentication, and request/response formats) is encapsulated within these modules in thesrc/llm/
directory. We won't dive into each module, but understand that each exports an object containing:-
name
: The provider's name (e.g., 'openai', 'anthropic'). -
supportModels
: An array of model strings this provider handles (e.g.,['gpt-4o-mini']
,['claude-3-5-sonnet-20240620']
). -
requiredEnv
: An array of environment variable names needed for this provider (e.g.,['OPENAI_API_KEY']
). -
invoke()
: An async function to handle non-streaming chat completion requests. -
stream()
: An async generator function to handle streaming chat completion requests.
-
-
Dynamic Provider Loading (
getModels
):
function getModels(env: Record<string, string>) { return [ openai(env), anthropic(env), // ... other providers ].filter((it) => it.requiredEnv.every((it) => it in env)) }
This crucial function initializes all potential provider modules, passing the environment variables (
env
) to them. It then filters this list, keeping only the providers for which all required environment variables (API keys, endpoints, etc.) are actually present in the deployed Worker's environment. This means the proxy only activates backends that you have configured with secrets/variables. -
Middleware: Hono applications use middleware extensively:
- CORS:
app.use(cors(...))
handles Cross-Origin Resource Sharing. It's configured to allow origins specified in theCORS_ORIGIN
environment variable, enabling web applications hosted on permitted domains to call the proxy. - Error Handling: A simple middleware catches errors, serializes them, and throws an
HTTPException
for standardized error responses. -
Proxy Authentication:
app.use(async (c, next) => { if (!c.env.API_KEY) { /* ... error ... */ } if (`Bearer ${c.env.API_KEY}` !== c.req.header('Authorization')) { /* ... error ... */ } return next() })
This middleware protects the proxy itself. It checks if the
API_KEY
environment variable is set and validates that incoming requests include a matchingAuthorization: Bearer <your_proxy_api_key>
header. This prevents unauthorized use of your proxy endpoint.
- CORS:
-
Routing:
-
/v1/chat/completions
(POST): This is the main workhorse.- It reads the JSON request body sent by the client.
- It calls
getModels(c.env)
to get the list of enabled providers based on the current environment configuration. - It finds the correct provider:
list.find((it) => it.supportModels.includes(req.model))
. It looks at the"model"
field in the client's request (e.g.,"gpt-4o-mini"
,"claude-3-haiku-20240307"
) and finds the enabled provider module that lists this model in itssupportModels
. - If no enabled provider supports the requested model, it returns a 400 error.
- If the request body includes
"stream": true
, it calls the provider'sllm.stream()
method and usesstreamSSE
to send the response back as Server-Sent Events. - Otherwise (for non-streaming requests), it calls the provider's
llm.invoke()
method and returns the complete JSON response.
-
/v1/models
(GET): This route provides a list of all models available through the currently configured and enabled providers on the proxy.- It calls
getModels(c.env)
. - It iterates through the enabled providers and their
supportModels
. - It formats this information into the standard OpenAI
/v1/models
response structure, indicating which provider (owned_by
) offers each model (id
). This allows clients compatible with the OpenAI API to discover which models they can request through the proxy.
- It calls
-
/v1/chat/completions
(OPTIONS): Handles CORS preflight requests necessary for browsers.
-
Step 3: Configuration (wrangler.toml
& Secrets)
Configuration happens in two main places:
-
wrangler.toml
: This file defines the Worker's settings for Cloudflare.
name = "openai-api-proxy" # Choose a unique name for your worker main = "src/index.ts" # Entry point script compatibility_date = "2024-08-22" # Use a recent date # Enable observability for logs/analytics in Cloudflare dashboard [observability] enabled = true
You might customize the
name
. Themain
andcompatibility_date
should generally be kept as they are from the repository unless you have specific reasons to change them. -
Environment Variables / Secrets (Crucial!): This is where you configure the proxy's own security and enable the backend LLM providers. You should always set these as Secrets in the Cloudflare dashboard, not plain text environment variables in
wrangler.toml
, for security.-
API_KEY
(Required): This is the secret key you define to protect access to your proxy. Any client calling your deployed worker URL will need to provide this key in theAuthorization: Bearer <API_KEY>
header. Choose a strong, random string. -
CORS_ORIGIN
(Optional): If you need to call this proxy from a browser-based application, set this to the origin of your web app (e.g.,https://myapp.example.com
). You can also use*
for development, but be cautious in production. - Backend Provider Keys (Required for each provider you want to use): You must set the secrets corresponding to the LLM providers you want to enable. Refer to the project's
README.md
for the exact variable names required for each provider. For example:- To enable OpenAI: Set
OPENAI_API_KEY
with your OpenAI API key. - To enable Anthropic: Set
ANTROPIC_API_KEY
with your Anthropic API key. - To enable Google Gemini: Set
GOOGLE_GEN_AI_API_KEY
. - To enable Groq: Set
GROQ_API_KEY
. - To enable Azure OpenAI: Set
AZURE_OPENAI_API_KEY
,AZURE_OPENAI_ENDPOINT
,AZURE_API_VERSION
, and potentiallyAZURE_DEPLOYMENT_MODELS
. - ...and so on for other providers listed in the README.
- To enable OpenAI: Set
Remember: The proxy will only route requests to backends for which the corresponding secrets are correctly set in the Cloudflare Worker's environment.
-
Step 4: Deployment
- Set Secrets: Go to your Cloudflare dashboard: Workers & Pages -> Select your Worker (or create one if deploying for the first time) -> Settings -> Variables -> Environment Variables -> Edit variables -> Add Secret. Add the
API_KEY
secret and secrets for each LLM backend you want to enable (e.g.,OPENAI_API_KEY
,ANTROPIC_API_KEY
). -
Deploy: Run the deployment command from your project directory:
wrangler deploy
Wrangler will build the project and deploy it to your Cloudflare account. It will output the URL where your proxy is now live (e.g.,
https://openai-api-proxy.<your-subdomain>.workers.dev
).
Step 5: Testing
You can now test your deployed proxy using curl
or any HTTP client (like Postman or Insomnia). Replace <YOUR_WORKER_URL>
with the URL outputted by wrangler deploy
and <YOUR_PROXY_API_KEY>
with the value you set for the API_KEY
secret.
-
Check Available Models:
curl <YOUR_WORKER_URL>/v1/models \ -H "Authorization: Bearer <YOUR_PROXY_API_KEY>"
This should return a JSON list of all models supported by the providers you configured with secrets.
-
Send a Chat Completion Request (e.g., to OpenAI if configured):
curl <YOUR_WORKER_URL>/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <YOUR_PROXY_API_KEY>" \ -d '{ "model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Explain API proxies in simple terms."}], "temperature": 0.7 }'
-
Send a Chat Completion Request (e.g., to Anthropic if configured):
curl <YOUR_WORKER_URL>/v1/chat/completions \ -H "Content-Type: application/json" \ -H "Authorization: Bearer <YOUR_PROXY_API_KEY>" \ -d '{ "model": "claude-3-5-sonnet-20240620", "messages": [{"role": "user", "content": "Explain API proxies in simple terms."}], "temperature": 0.7 }'
Notice the only change needed to target a different backend (assuming both OpenAI and Anthropic secrets were configured) is the
"model"
field value. -
Using OpenAI SDKs: You can point existing OpenAI SDKs to your proxy:
import OpenAI from 'openai'; const openai = new OpenAI({ baseURL: '<YOUR_WORKER_URL>/v1', // Point to your proxy URL apiKey: '<YOUR_PROXY_API_KEY>', // Use the proxy's API key }); async function main() { const response = await openai.chat.completions.create({ // Use any model identifier configured and available through your proxy model: 'groq/llama3-8b-8192', // Example: Using Groq via the proxy messages: [{ role: 'user', content: 'Write a short poem about edge computing.' }], stream: false, }); console.log(response.choices[0].message.content); } main();
Advanced Considerations & Enhancements (approx. 150 words)
While this proxy provides a powerful foundation, consider these potential enhancements:
- Rate Limiting: Implement rate limiting directly in the Worker using Cloudflare's Rate Limiting product or custom logic (e.g., using KV store) to prevent abuse and manage costs.
- Caching: Cache responses for identical requests (where appropriate) using the Cloudflare Cache API or KV store to improve performance and reduce backend calls.
- Logging & Monitoring: Leverage Cloudflare's built-in Worker observability or integrate third-party logging services for better insights into usage and errors.
- Input/Output Validation: Add more robust validation of request payloads and potentially sanitize responses.
- User-Specific Keys: Modify the authentication to support multiple API keys, perhaps mapping different client keys to different backend configurations or rate limits.
- More Endpoints: Extend the proxy to support other OpenAI API endpoints like
/v1/embeddings
by adding corresponding routes and provider logic.
Conclusion
Building an API proxy, especially for unifying diverse LLM services, might seem daunting, but frameworks like Hono and serverless platforms like Cloudflare Workers make it remarkably accessible. The openai-api-proxy
project provides an excellent starting point, demonstrating how to route requests based on model identifiers and manage multiple backend credentials securely. By deploying your own proxy, you gain a standardized interface, centralized control, and the flexibility to leverage the best models for your needs without constantly refactoring your client applications. This approach empowers developers to navigate the dynamic AI landscape with greater ease and efficiency.
Top comments (0)