Unified API for
LLM inference.

Every leading LLM behind one drop-in OpenAI-compatible API. Open source LLMs run on Runware's own infrastructure at up to 80% lower cost.

Get an API key See the models

20+Leading LLMs

1Endpoint

OpenAICompatible

POST /v1/chat/completions92 ms TTFT

Modelzai:[email protected]

Intelligence76 / 100

user

Refactor this Python function for readability.

assistant

stream4 models · auto-rotating

Why Runware

Ship faster. Spend less.

Run LLM inference on infrastructure built for production. Leave reliability, scale, and operations to us.

Up to 80%

Lower cost

Our open source LLM API runs on Runware infrastructure. Closed source LLMs passed through at market leading rates.

Predictable

Latency

Stable p50 and p95 under real production traffic. Tuned at the hardware level.

No caps

No subs, no ceilings

No subscriptions to commit to. Pay only for what you use.

99.99%

Target uptime

Built on the Sonic Inference Engine. Live status at status.runware.ai.

Available LLMs

Open source and closed source, same endpoint.

Leading closed source LLMs from Anthropic, OpenAI, Google and xAI alongside the best open source LLMs running on Runware's own hardware. Pick by quality, cost or context window. Same auth either way.

Open source · Runware infrastructure

Top open source LLMs on custom hardware.

Frontier multimodal agentic model with 1M context for coding

DeepSeek-V4-Flash

deepseek:v4@flash

1M ctx · tool use

DeepSeek-V4-Pro

deepseek:v4@pro

High-capability frontier LLM with 1M context

moonshotai:[email protected]

Multimodal · agents

zai:[email protected]

google:gemma@4-31b

Open 31B multimodal reasoning model for coding

MiniMax M2.7 Highspeed

minimax:m2.7@highspeed

Performance-tuned

alibaba:[email protected]

RL-trained agentic

zai:[email protected]

73.8% SWE-bench

alibaba:[email protected]

262K ctx · workhorse

Closed source · Pass-through pricing

Leading closed source LLMs at market leading rates.

google:[email protected]

anthropic:claude@fable-5

Frontier multimodal agentic model for long-horizon coding

Claude Opus 4.8

anthropic:[email protected]

Frontier multimodal reasoning model for advanced coding

Gemini 3.5 Flash

google:[email protected]

Frontier multimodal reasoning model for agentic and coding workflows

xai:[email protected]

openai:[email protected]

Claude Opus 4.7

anthropic:[email protected]

openai:[email protected]

openai:[email protected]

openai:[email protected]

openai:[email protected]

Enterprise-grade reasoning LLM optimized for high-performance professional workloads

Gemini 3.1 Flash Lite

google:[email protected]

Multimodal · fast

Use cases

Match the model to the job.

Every workload has a model that suits it. Pick a task to see what we'd reach for.

Coding agents & engineering

Refactor at codebase scale, generate tests, run autonomous tickets end to end. Models here lead public SWE-bench Verified and Pro deployments, with strong tool calling and long-horizon execution.

Recommended models

GLM-4.7Open~9x cheaper than Sonnet 4.6
zai:[email protected]
Try in Playground
Best open source coder. 73.8% on SWE-bench Verified at a fraction of closed source pricing.
Claude Sonnet 4.6Closed
anthropic:[email protected]
Try in Playground
Production default for coding agents. Strong on branched, multi-step tickets and computer-use scaffolds.
MiniMax M2.7Open~13x cheaper than Sonnet 4.6
minimax:m2.7@0
Try in Playground
Reliable agentic coding with strong instruction-following on long-horizon tool workflows.

Why choose OSS

Open source has closed the gap.

On most production workloads, open source LLMs now match closed source on quality, at a fraction of the cost. On some, they're the strongest available choice.

GLM-4.7

zai:[email protected]

358B MoE with interleaved thinking. Scores 73.8% on SWE-bench Verified at a fraction of closed source pricing.

Qwen3.5-397B

alibaba:[email protected]

Sparse MoE at 397B total / 17B active. 262K native context, extensible to ~1M. Built for long-context reasoning at scale.

Kimi K2.6

moonshotai:[email protected]

Top open source option for multimodal agents. Image and video understanding alongside long-horizon software tasks.

MiniMax M2.7

minimax:m2.7@0

Long-context agentic coding tuned for production tool use. Holds quality at high throughput.

Intelligence vs cost

Same intelligence band. A fraction of the cost.

Open sourceClosed source

Open source

MiniMax M2.5~16x cheaper

minimax:m2.5@0

80.2$0.95

DeepSeek-V4-Flash~54x cheaper

deepseek:v4@flash

79.0$0.28

Qwen3.5-397B~5x cheaper

alibaba:[email protected]

76.4$3.20

GLM-4.7~9x cheaper

zai:[email protected]

73.8$1.75

Closed source

Gemini 3.1 Pro

google:[email protected]

87.9$12

Claude Opus 4.7

anthropic:[email protected]

87.6$25

GPT-5.5

openai:[email protected]

85.1$30

Claude Sonnet 4.6

anthropic:[email protected]

80.8$15

Intelligence: SWE-bench Verified scores from public May 2026 model reports. Cost: Runware list price per million output tokens, pulled live from the model catalog. Open source models run on Runware's own hardware; closed source passed through at market leading rates.

What open source buys you

A fraction of the cost to run. Auditable weights. No behaviour changes overnight, no surprise deprecations under your stack.

When closed source still wins

The most demanding reasoning, complex agent orchestration, and computer use. For most other production work, open source is the right default, and the right place to start.

LLM API pricing

What you'd save by switching to open source.

Pick a workload and a monthly token volume. We'll substitute the closed source model you're using today with an equivalent open source model on Runware, and show the monthly saving at your scale.

Full LLM pricing

Use case

Monthly volume50M tokens

1M10M100M1B

Comparison against the equivalent closed source model billed direct from the provider. Real Runware list prices, blended across a representative input/output mix for the selected workload.

Monthly · 50M tokens

Open source on Runware

GLM-4.7 on Runware

$54/mo

Closed source direct

Claude Sonnet 4.6 direct

$451/mo

Monthly saving

$397· 88% less

OpenAI-compatible LLM API

Already on the OpenAI SDK? Change two values.

The Runware text endpoint speaks the OpenAI protocol. The request shape, streaming format and SDKs you already use carry over unchanged. Point your existing client at https://api.runware.ai/v1, swap the API key and pick a Runware model.

POST /v1/chat/completions

from openai import OpenAI

client = OpenAI(
    api_key="your_runware_api_key",
    base_url="https://api.runware.ai/v1",
)

response = client.chat.completions.create(
    model="minimax:m2.7@0",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    max_completion_tokens=256,
)

print(response.choices[0].message.content)

Streaming

"stream": true. Identical SSE format. Use the chunks you already parse.

Reasoning models

delta.reasoning_content streams before delta.content on models that support it.

Native API also available

Use the native /v1 endpoint for taskUUID, async delivery, includeCost, and one shape across every modality.

OpenAI compatibility reference

Built for production

LLM inference built for demanding products.

Custom inference hardware

Open source LLMs run on the Sonic Inference Engine, tuned end-to-end for high-throughput inference.

OpenAI-compatible LLM API

Drop-in replacement. Change the base URL and the API key.

Reasoning-model support

Native handling of internal reasoning channels. Streaming compatible with OpenAI's SSE format.

Multi-modal context

One Runware account also covers image, video, audio and 3D through the native API. No extra signup, no separate billing.

Consistent under load

Predictable behaviour as traffic ramps. No mystery degradation when usage doubles.

Enterprise SLAs

99.99% uptime tiers, dedicated capacity, committed-use rates. Available on request.

Security

Your data stays yours.

No training on customer data

Prompts and outputs are never used to train any model.

Encrypted in transit and at rest

TLS 1.3 in transit, encrypted storage for any retained data.

Tenant isolation

Inference runs in isolated execution contexts.

Zero data retention option

Available on enterprise. Requests processed in memory, discarded on completion.

GDPR-ready

EU data handling available on request. SOC 2 certified.

FAQ

Per million tokens, billed monthly. Open source rates set by Runware; closed source passed through at market leading rates. Full detail at /pricing.

Open source LLM API

Get an API key.

Pick any model on Runware. The OpenAI SDK works as-is, or use our native API for more control.

Get started Talk to us