Pricing

How Runware pricing works, from compute-based billing to understanding costs in your integration.

Pay as you go

Our pricing philosophy is simple: we optimize models to run faster, and we pass those savings directly to you. Unlike platforms that charge a flat fee per generation regardless of inference time, our architecture is based on optimized compute time, so fewer GPU seconds means lower cost. Pay only for what you use, with no subscriptions or commitments.

Pricing models

We operate with two primary pricing structures depending on the model type:

Serverless (Optimized Compute)

For most open-source models (like Stable Diffusion, Flux, etc.) that we host and optimize, pricing is based on compute time.

Granular billing: You are charged for the exact resources used to generate your output.
Speed discounts: As we optimize our inference engine to be faster, the cost per generation drops automatically.
No idle costs: You don't pay for cold starts or idle GPU time.

Example: If we optimize a model to run 2x faster, your cost for that generation effectively drops by ~50%.

Fixed Price

For closed-source or partner models where we do not control the underlying infrastructure optimization, we may offer fixed per-request pricing.

Predictable costs: You know exactly how much each request costs upfront.
Standardized: Prices are set based on the provider's rates or license agreements.

Our aggregate request volume across the platform allows us to negotiate competitive rates with providers, often resulting in lower per-request pricing than integrating with them directly.

For high-volume deployments, contact our sales team to discuss custom pricing.

What affects cost

The cost of a generation depends on several factors:

Factor	Impact
Model	Different models have different compute requirements and provider rates.
Resolution	Higher output resolution requires more processing time.
Duration	Longer video or audio outputs cost more.
Steps	More inference steps increase compute time (serverless models).
Batch size	Cost scales linearly with the number of outputs requested.

For serverless models, anything that increases GPU time increases cost. For fixed-price models, costs are determined per request based on the provider's pricing structure.

Understanding costs

All costs are denominated in USD. Your account balance is deducted in real-time as you generate content, and you can top up or configure auto-reload in the Dashboard . Your balance does not expire.

To avoid service interruptions, you can configure auto-reload to automatically top up your balance when it falls below a threshold. The Dashboard also lets you set up low-balance alerts and backup payment methods.

Failed requests are not charged. You only pay for successful generations.

To see the exact cost of any request, include the includeCost parameter in your API call. The response will contain a cost field showing the amount in USD deducted for that specific task.

To see the pricing for a specific model, check its page in the Models section. You can also generate a test request in the Playground to see the exact cost before integrating.