Unified API for
LLM inference.

Every leading LLM behind one drop-in OpenAI-compatible API. Open source LLMs run on Runware's own infrastructure at up to 80% lower cost.

20+Leading LLMs
1Endpoint
OpenAICompatible
POST /v1/chat/completions92 ms TTFT
Intelligence76 / 100
user

Refactor this Python function for readability.

assistant

stream4 models · auto-rotating
Why Runware

Ship faster. Spend less.

Run LLM inference on infrastructure built for production. Leave reliability, scale, and operations to us.

Up to 80%

Lower cost

Our open source LLM API runs on Runware infrastructure. Closed source LLMs passed through at market leading rates.

Predictable

Latency

Stable p50 and p95 under real production traffic. Tuned at the hardware level.

No caps

No subs, no ceilings

No subscriptions to commit to. Pay only for what you use.

99.99%

Target uptime

Built on the Sonic Inference Engine. Live status at status.runware.ai.

Available LLMs

Open source and closed source, same endpoint.

Leading closed source LLMs from Anthropic, OpenAI, Google and xAI alongside the best open source LLMs running on Runware's own hardware. Pick by quality, cost or context window. Same auth either way.

Use cases

Match the model to the job.

Every workload has a model that suits it. Pick a task to see what we'd reach for.

Coding agents & engineering

Refactor at codebase scale, generate tests, run autonomous tickets end to end. Models here lead public SWE-bench Verified and Pro deployments, with strong tool calling and long-horizon execution.

Recommended models
  • GLM-4.7Open~9x cheaper than Sonnet 4.6
    zai:[email protected]
    Try in Playground

    Best open source coder. 73.8% on SWE-bench Verified at a fraction of closed source pricing.

  • Claude Sonnet 4.6Closed
    anthropic:[email protected]
    Try in Playground

    Production default for coding agents. Strong on branched, multi-step tickets and computer-use scaffolds.

  • MiniMax M2.7Open~13x cheaper than Sonnet 4.6
    minimax:m2.7@0
    Try in Playground

    Reliable agentic coding with strong instruction-following on long-horizon tool workflows.

Why choose OSS

Open source has closed the gap.

On most production workloads, open source LLMs now match closed source on quality, at a fraction of the cost. On some, they're the strongest available choice.

358B MoE with interleaved thinking. Scores 73.8% on SWE-bench Verified at a fraction of closed source pricing.

Qwen3.5-397B

Sparse MoE at 397B total / 17B active. 262K native context, extensible to ~1M. Built for long-context reasoning at scale.

Kimi K2.6

Top open source option for multimodal agents. Image and video understanding alongside long-horizon software tasks.

MiniMax M2.7
minimax:m2.7@0

Long-context agentic coding tuned for production tool use. Holds quality at high throughput.

Intelligence vs cost
Same intelligence band. A fraction of the cost.
Open sourceClosed source
Open source
MiniMax M2.5~16x cheaper
minimax:m2.5@0
80.2$0.95
DeepSeek-V4-Flash~54x cheaper
deepseek:v4@flash
79.0$0.28
Qwen3.5-397B~5x cheaper
alibaba:[email protected]
76.4$3.20
GLM-4.7~9x cheaper
zai:[email protected]
73.8$1.75
Closed source
Gemini 3.1 Pro
google:[email protected]
87.9$12
Claude Opus 4.7
anthropic:[email protected]
87.6$25
GPT-5.5
openai:[email protected]
85.1$30
Claude Sonnet 4.6
anthropic:[email protected]
80.8$15

Intelligence: SWE-bench Verified scores from public May 2026 model reports. Cost: Runware list price per million output tokens, pulled live from the model catalog. Open source models run on Runware's own hardware; closed source passed through at market leading rates.

What open source buys you

A fraction of the cost to run. Auditable weights. No behaviour changes overnight, no surprise deprecations under your stack.

When closed source still wins

The most demanding reasoning, complex agent orchestration, and computer use. For most other production work, open source is the right default, and the right place to start.

LLM API pricing

What you'd save by switching to open source.

Pick a workload and a monthly token volume. We'll substitute the closed source model you're using today with an equivalent open source model on Runware, and show the monthly saving at your scale.

Full LLM pricing
Use case
Monthly volume50M tokens
1M10M100M1B

Comparison against the equivalent closed source model billed direct from the provider. Real Runware list prices, blended across a representative input/output mix for the selected workload.

Monthly · 50M tokens
Open source on Runware
GLM-4.7 on Runware
$54/mo
Closed source direct
Claude Sonnet 4.6 direct
$451/mo
Monthly saving
$397· 88% less
OpenAI-compatible LLM API

Already on the OpenAI SDK? Change two values.

The Runware text endpoint speaks the OpenAI protocol. The request shape, streaming format and SDKs you already use carry over unchanged. Point your existing client at https://api.runware.ai/v1, swap the API key and pick a Runware model.

POST /v1/chat/completions
from openai import OpenAI

client = OpenAI(
    api_key="your_runware_api_key",
    base_url="https://api.runware.ai/v1",
)

response = client.chat.completions.create(
    model="minimax:m2.7@0",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    max_completion_tokens=256,
)

print(response.choices[0].message.content)
Streaming

"stream": true. Identical SSE format. Use the chunks you already parse.

Reasoning models

delta.reasoning_content streams before delta.content on models that support it.

Native API also available

Use the native /v1 endpoint for taskUUID, async delivery, includeCost, and one shape across every modality.

OpenAI compatibility reference
Built for production

LLM inference built for demanding products.

Custom inference hardware

Open source LLMs run on the Sonic Inference Engine, tuned end-to-end for high-throughput inference.

OpenAI-compatible LLM API

Drop-in replacement. Change the base URL and the API key.

Reasoning-model support

Native handling of internal reasoning channels. Streaming compatible with OpenAI's SSE format.

Multi-modal context

One Runware account also covers image, video, audio and 3D through the native API. No extra signup, no separate billing.

Consistent under load

Predictable behaviour as traffic ramps. No mystery degradation when usage doubles.

Enterprise SLAs

99.99% uptime tiers, dedicated capacity, committed-use rates. Available on request.

Security

Your data stays yours.

No training on customer data

Prompts and outputs are never used to train any model.

Encrypted in transit and at rest

TLS 1.3 in transit, encrypted storage for any retained data.

Tenant isolation

Inference runs in isolated execution contexts.

Zero data retention option

Available on enterprise. Requests processed in memory, discarded on completion.

GDPR-ready

EU data handling available on request. SOC 2 certified.

FAQ

Per million tokens, billed monthly. Open source rates set by Runware; closed source passed through at market leading rates. Full detail at /pricing.

Open source LLM API

Get an API key.

Pick any model on Runware. The OpenAI SDK works as-is, or use our native API for more control.