The Unified Gateway for Visual Intelligence.

Understand, reason over, and act on images, video, and documents with Orion — our flagship visual agent.

Loved by leading

AI companies

Loved by leading

AI companies

Interact with images, videos, documents in a single API.

Meet the agent that sees, reasons and acts

Frontier models like GPT, Claude, and Gemini can describe what they see – but they can’t act on it. Orion unites the reasoning power of large Vision-Language Models with the accuracy of specialized computer-vision tools – all through one unified API.

One interface for all your visual AI needs.

Images, documents, and videos – all through a single chat-completions interface.

Compose through conversation.

Chain visual operations like: detect → crop → enhance → analyze in a single conversation.

Integrate at warp speed

Drop-in replacement for the OpenAI SDK.
Same API pattern – new visual powers.

Drop-in replacement for the OpenAI SDK. Same API pattern – new visual powers.

Drop-in replacement for the OpenAI SDK. Same API pattern – new visual powers.

Auditable outputs

Every response comes with visual proof. Build and integrate with confidence.

Ridiculously versatile.

Ridiculously versatile.

Whatever your visual task – Orion knows how to act. Built with dozens of specialized computer-vision and multi-modal tools.

01

Image Intelligence

Caption & Tag

Generate rich, contextual descriptions and semantic labels for any image.

Detection

Generate rich, contextual descriptions and semantic labels for any image.

Segmentation

Generate rich, contextual descriptions and semantic labels for any image.

Pointing

Generate rich, contextual descriptions and semantic labels for any image.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

UI Parsing

Generate rich, contextual descriptions and semantic labels for any image.

Image Tools

Generate rich, contextual descriptions and semantic labels for any image.

01

Image Intelligence

Caption & Tag

Generate rich, contextual descriptions and semantic labels for any image.

Detection

Generate rich, contextual descriptions and semantic labels for any image.

Segmentation

Generate rich, contextual descriptions and semantic labels for any image.

Pointing

Generate rich, contextual descriptions and semantic labels for any image.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

UI Parsing

Generate rich, contextual descriptions and semantic labels for any image.

Image Tools

Generate rich, contextual descriptions and semantic labels for any image.

01

Image Intelligence

Caption & Tag

Generate rich, contextual descriptions and semantic labels for any image.

Detection

Generate rich, contextual descriptions and semantic labels for any image.

Segmentation

Generate rich, contextual descriptions and semantic labels for any image.

Pointing

Generate rich, contextual descriptions and semantic labels for any image.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

UI Parsing

Generate rich, contextual descriptions and semantic labels for any image.

Image Tools

Generate rich, contextual descriptions and semantic labels for any image.

01

Image Intelligence

Caption & Tag

Generate rich, contextual descriptions and semantic labels for any image.

Detection

Generate rich, contextual descriptions and semantic labels for any image.

Segmentation

Generate rich, contextual descriptions and semantic labels for any image.

Pointing

Generate rich, contextual descriptions and semantic labels for any image.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

UI Parsing

Generate rich, contextual descriptions and semantic labels for any image.

Image Tools

Generate rich, contextual descriptions and semantic labels for any image.

02

Document Intelligence

Document Parsing

Parse, summarize or split documents in seconds.

Structured OCR

Generate rich, contextual descriptions and semantic labels for any image.

Redaction

Generate rich, contextual descriptions and semantic labels for any image.

02

Document Intelligence

Document Parsing

Parse, summarize or split documents in seconds.

Structured OCR

Generate rich, contextual descriptions and semantic labels for any image.

Redaction

Generate rich, contextual descriptions and semantic labels for any image.

02

Document Intelligence

Document Parsing

Parse, summarize or split documents in seconds.

Structured OCR

Generate rich, contextual descriptions and semantic labels for any image.

Redaction

Generate rich, contextual descriptions and semantic labels for any image.

02

Document Intelligence

Document Parsing

Parse, summarize or split documents in seconds.

Structured OCR

Generate rich, contextual descriptions and semantic labels for any image.

Redaction

Generate rich, contextual descriptions and semantic labels for any image.

03

Video Intelligence

Caption & Tag

Generate comprehensive descriptions and metadata for video content.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

Video Tools

Generate rich, contextual descriptions and semantic labels for any image.

03

Video Intelligence

Caption & Tag

Generate comprehensive descriptions and metadata for video content.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

Video Tools

Generate rich, contextual descriptions and semantic labels for any image.

03

Video Intelligence

Caption & Tag

Generate comprehensive descriptions and metadata for video content.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

Video Tools

Generate rich, contextual descriptions and semantic labels for any image.

03

Video Intelligence

Caption & Tag

Generate comprehensive descriptions and metadata for video content.

Generate & Edit

Generate rich, contextual descriptions and semantic labels for any image.

Video Tools

Generate rich, contextual descriptions and semantic labels for any image.

OpenAI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

from openai import OpenAI


client = OpenAI(

base_url="https://agent.vlm.run/v1/openai",

api_key="<VLMRUN_API_KEY>"

)

result = client.chat.completions.create(

model="vlmrun-orion-1",

messages=[

{"role": "user", "content": [

{"type": "text", "text": "Analyze the image & animate into a video."},

{"type": "image_url", "image_url": {"url": "https://..."}}

]}

)

print(result.choices[0].message.content)

OpenAI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

from openai import OpenAI


client = OpenAI(

base_url="https://agent.vlm.run/v1/openai",

api_key="<VLMRUN_API_KEY>"

)

result = client.chat.completions.create(

model="vlmrun-orion-1",

messages=[

{"role": "user", "content": [

{"type": "text", "text": "Analyze the image & animate into a video."},

{"type": "image_url", "image_url": {"url": "https://..."}}

]}

)

print(result.choices[0].message.content)

OpenAI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

from openai import OpenAI


client = OpenAI(

base_url="https://agent.vlm.run/v1/openai",

api_key="<VLMRUN_API_KEY>"

)

result = client.chat.completions.create(

model="vlmrun-orion-1",

messages=[

{"role": "user", "content": [

{"type": "text", "text": "Analyze the image & animate into a video."},

{"type": "image_url", "image_url": {"url": "https://..."}}

]}

)

print(result.choices[0].message.content)

OpenAI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

from openai import OpenAI


client = OpenAI(

base_url="https://agent.vlm.run/v1/openai",

api_key="<VLMRUN_API_KEY>"

)

result = client.chat.completions.create(

model="vlmrun-orion-1",

messages=[

{"role": "user", "content": [

{"type": "text", "text": "Analyze the image & animate into a video."},

{"type": "image_url", "image_url": {"url": "https://..."}}

]}

)

print(result.choices[0].message.content)

OpenAI

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

from openai import OpenAI


client = OpenAI(

base_url="https://agent.vlm.run/v1/openai",

api_key="<VLMRUN_API_KEY>"

)

result = client.chat.completions.create(

model="vlmrun-orion-1",

messages=[

{"role": "user", "content": [

{"type": "text", "text": "Analyze the image & animate into a video."},

{"type": "image_url", "image_url": {"url": "https://..."}}

]}

)

print(result.choices[0].message.content)

For Developers

Designed for developers.

Designed for developers.

Familiar API, unfamiliar power

Familiar API, unfamiliar power

All your favorite vision tools, in a single box.

Drop-in replacement for OpenAI SDK.

Handles images, documents, videos via URL or upload.

Streaming support for real-time responses.

Structured outputs with Pydantic / Zod support.

Only pay for what you use

Only pay for what you use.

Only pay for what you use

Granular usage billing keeps your costs in check.

Granular usage billing keeps your costs in check.

100 credits = $1 USD

See credit-pricing table for more details.

Starter

$0

/mo

Pay-as-you-go

1 credit / image or page

1000 free sign-up credits

Up to 25K requests / month

Up to 10 requests / minute

Community Discord

Basic Usage Logs

Pro

$799

/mo

Usage-based

1 credit / image or page

100K credits included / month

Unlimited requests / month

Up to 100 requests / minute

Dedicated Slack support

Zero-Data Retention (ZDR)

Business Associate Agreement (BSA)

Enterprise

Custom

Invoiced Billing

Tier-based pricing

Volume discounts

Unlimited requests / month

Custom rate-limits

Dedicated Slack support

In-VPC Deployments, ZDR

SOC2, HIPAA, BAA Execution

Custom SLAs

For Enterprises

The new visual intelligence layer for your enterprise.

The new visual intelligence layer for your enterprise.

Deploy securely inside your VPC or private cloud – bringing visual intelligence directly to your infrastructure. Power document, image, and video understanding across teams. SOC 2 Type II and HIPAA-ready.

Frequently asked

questions

FAQs

How is Orion different from GPT-5, Claude, or Gemini?

Frontier models can describe what they see — but not act on it. Orion goes beyond perception by planning, executing, and validating visual tasks. Instead of just describing an image, Orion can detect, crop, segment, generate, and analyze in sequence — reliably and deterministically.

What can I build with Orion?

Developers are already using Orion for a wide range of applications – from visual ETL pipelines that detect, extract, and structure information, to automated product and marketing asset generation, document parsing and redaction, video summarization and clipping, and even medical and geospatial visual analysis. If your workflow involves visual data, Orion can make it intelligent and interactive.

How is Orion priced?

Orion’s pricing is designed to be flexible and transparent — based on the tools you use and the volume of visual tasks you run. Each visual capability (detection, segmentation, OCR, generation, etc.) is priced per use, allowing you to scale from experimentation to production without committing to fixed model tiers. Enterprise plans include predictable monthly or on-prem options for teams that need volume pricing, VPC deployments, or compliance guarantees.

How accurate is Orion compared to traditional CV models?

Orion taps into both modern VLM archteictures and traditional computer-vision models, allowing it to reason and accurately perform visual tasks. In benchmarks across several multi-modal tasks (MMMU, MMBench, DocVQA, RefCOCO etc), Orion consistent outperformed leading VLMs on multi-step visual reasoning, structured OCR and traditional CV tasks like detection, segmentation, tracking. See our whitepaper for more details.

How do you keep data private?

Our Pro-tier offering runs entirely in our private cloud deployment. Requests made to our APIs will be logged and made available in our observability dashboard. For our enterprise-tier, we can enforce higher privacy requirements (SOC2, GDPR, HIPAA).

Can I run Orion on-prem on in-VPC?

Yes. For enterprise deployments, VLM Run offers both VPC Peering and In-VPC hosting options — ensuring data never leaves your environment. We’re SOC 2 Type II and HIPAA-ready for teams with compliance requirements.

How is Orion different from GPT-5, Claude, or Gemini?

Frontier models can describe what they see — but not act on it. Orion goes beyond perception by planning, executing, and validating visual tasks. Instead of just describing an image, Orion can detect, crop, segment, generate, and analyze in sequence — reliably and deterministically.

What can I build with Orion?

Developers are already using Orion for a wide range of applications – from visual ETL pipelines that detect, extract, and structure information, to automated product and marketing asset generation, document parsing and redaction, video summarization and clipping, and even medical and geospatial visual analysis. If your workflow involves visual data, Orion can make it intelligent and interactive.

How is Orion priced?

Orion’s pricing is designed to be flexible and transparent — based on the tools you use and the volume of visual tasks you run. Each visual capability (detection, segmentation, OCR, generation, etc.) is priced per use, allowing you to scale from experimentation to production without committing to fixed model tiers. Enterprise plans include predictable monthly or on-prem options for teams that need volume pricing, VPC deployments, or compliance guarantees.

How accurate is Orion compared to traditional CV models?

Orion taps into both modern VLM archteictures and traditional computer-vision models, allowing it to reason and accurately perform visual tasks. In benchmarks across several multi-modal tasks (MMMU, MMBench, DocVQA, RefCOCO etc), Orion consistent outperformed leading VLMs on multi-step visual reasoning, structured OCR and traditional CV tasks like detection, segmentation, tracking. See our whitepaper for more details.

How do you keep data private?

Our Pro-tier offering runs entirely in our private cloud deployment. Requests made to our APIs will be logged and made available in our observability dashboard. For our enterprise-tier, we can enforce higher privacy requirements (SOC2, GDPR, HIPAA).

Can I run Orion on-prem on in-VPC?

Yes. For enterprise deployments, VLM Run offers both VPC Peering and In-VPC hosting options — ensuring data never leaves your environment. We’re SOC 2 Type II and HIPAA-ready for teams with compliance requirements.

How is Orion different from GPT-5, Claude, or Gemini?

Frontier models can describe what they see — but not act on it. Orion goes beyond perception by planning, executing, and validating visual tasks. Instead of just describing an image, Orion can detect, crop, segment, generate, and analyze in sequence — reliably and deterministically.

What can I build with Orion?

Developers are already using Orion for a wide range of applications – from visual ETL pipelines that detect, extract, and structure information, to automated product and marketing asset generation, document parsing and redaction, video summarization and clipping, and even medical and geospatial visual analysis. If your workflow involves visual data, Orion can make it intelligent and interactive.

How is Orion priced?

Orion’s pricing is designed to be flexible and transparent — based on the tools you use and the volume of visual tasks you run. Each visual capability (detection, segmentation, OCR, generation, etc.) is priced per use, allowing you to scale from experimentation to production without committing to fixed model tiers. Enterprise plans include predictable monthly or on-prem options for teams that need volume pricing, VPC deployments, or compliance guarantees.

How accurate is Orion compared to traditional CV models?

Orion taps into both modern VLM archteictures and traditional computer-vision models, allowing it to reason and accurately perform visual tasks. In benchmarks across several multi-modal tasks (MMMU, MMBench, DocVQA, RefCOCO etc), Orion consistent outperformed leading VLMs on multi-step visual reasoning, structured OCR and traditional CV tasks like detection, segmentation, tracking. See our whitepaper for more details.

How do you keep data private?

Our Pro-tier offering runs entirely in our private cloud deployment. Requests made to our APIs will be logged and made available in our observability dashboard. For our enterprise-tier, we can enforce higher privacy requirements (SOC2, GDPR, HIPAA).

Can I run Orion on-prem on in-VPC?

Yes. For enterprise deployments, VLM Run offers both VPC Peering and In-VPC hosting options — ensuring data never leaves your environment. We’re SOC 2 Type II and HIPAA-ready for teams with compliance requirements.

How is Orion different from GPT-5, Claude, or Gemini?

Frontier models can describe what they see — but not act on it. Orion goes beyond perception by planning, executing, and validating visual tasks. Instead of just describing an image, Orion can detect, crop, segment, generate, and analyze in sequence — reliably and deterministically.

What can I build with Orion?

Developers are already using Orion for a wide range of applications – from visual ETL pipelines that detect, extract, and structure information, to automated product and marketing asset generation, document parsing and redaction, video summarization and clipping, and even medical and geospatial visual analysis. If your workflow involves visual data, Orion can make it intelligent and interactive.

How is Orion priced?

Orion’s pricing is designed to be flexible and transparent — based on the tools you use and the volume of visual tasks you run. Each visual capability (detection, segmentation, OCR, generation, etc.) is priced per use, allowing you to scale from experimentation to production without committing to fixed model tiers. Enterprise plans include predictable monthly or on-prem options for teams that need volume pricing, VPC deployments, or compliance guarantees.

How accurate is Orion compared to traditional CV models?

Orion taps into both modern VLM archteictures and traditional computer-vision models, allowing it to reason and accurately perform visual tasks. In benchmarks across several multi-modal tasks (MMMU, MMBench, DocVQA, RefCOCO etc), Orion consistent outperformed leading VLMs on multi-step visual reasoning, structured OCR and traditional CV tasks like detection, segmentation, tracking. See our whitepaper for more details.

How do you keep data private?

Our Pro-tier offering runs entirely in our private cloud deployment. Requests made to our APIs will be logged and made available in our observability dashboard. For our enterprise-tier, we can enforce higher privacy requirements (SOC2, GDPR, HIPAA).

Can I run Orion on-prem on in-VPC?

Yes. For enterprise deployments, VLM Run offers both VPC Peering and In-VPC hosting options — ensuring data never leaves your environment. We’re SOC 2 Type II and HIPAA-ready for teams with compliance requirements.

Visual Intelligence for Enterprise.

by Autonomi Al Inc. All rights reserved. © 2025

Visual Intelligence for Enterprise.

by Autonomi Al Inc. All rights reserved. © 2025

Visual Intelligence for Enterprise.

by Autonomi Al Inc. All rights reserved. © 2025

Visual Intelligence for Enterprise.

by Autonomi Al Inc. All rights reserved. © 2025