---
title: Pricing | Runware Docs
url: https://runware.ai/docs/platform/pricing
description: How Runware pricing works, from compute-based billing to understanding costs in your integration.
relatedDocuments:
  - https://runware.ai/docs/platform/account-management
---
## Pay as you go

Our pricing philosophy is simple: **we optimize models to run faster, and we pass those savings directly to you**. Unlike platforms that charge a flat fee per generation regardless of inference time, our architecture is based on **optimized compute time**, so fewer GPU seconds means lower cost. Pay only for what you use, with no subscriptions or commitments.

## Pricing models

We operate with two primary pricing structures depending on the model type:

### Serverless (Optimized Compute)

For most open-source models (like Stable Diffusion, Flux, etc.) that we host and optimize, pricing is based on **compute time**.

- **Granular billing**: You are charged for the exact resources used to generate your output.
- **Speed discounts**: As we optimize our inference engine to be faster, the cost per generation drops automatically.
- **No idle costs**: You don't pay for cold starts or idle GPU time.

> [!NOTE]
> **Example**: If we optimize a model to run 2x faster, your cost for that generation effectively drops by ~50%.

### Fixed Price

For closed-source or partner models where we do not control the underlying infrastructure optimization, we may offer **fixed per-request pricing**.

- **Predictable costs**: You know exactly how much each request costs upfront.
- **Standardized**: Prices are set based on the provider's rates or license agreements.

Our aggregate request volume across the platform allows us to negotiate competitive rates with providers, often resulting in lower per-request pricing than integrating with them directly.

For high-volume deployments, contact our [sales team](https://runware.ai/contact) to discuss custom pricing.

## What affects cost

The cost of a generation depends on several factors:

| Factor | Impact |
| --- | --- |
| **Model** | Different models have different compute requirements and provider rates. |
| **Resolution** | Higher output resolution requires more processing time. |
| **Duration** | Longer video or audio outputs cost more. |
| **Steps** | More inference steps increase compute time (serverless models). |
| **Batch size** | Cost scales linearly with the number of outputs requested. |

For serverless models, anything that increases GPU time increases cost. For fixed-price models, costs are determined per request based on the provider's pricing structure.

## Understanding costs

All costs are denominated in **USD**. Your account balance is deducted in real-time as you generate content, and you can top up or configure auto-reload in the [Dashboard](https://runware.ai/dashboard). Your balance does not expire.

To avoid service interruptions, you can configure **auto-reload** to automatically top up your balance when it falls below a threshold. The Dashboard also lets you set up **low-balance alerts** and **backup payment methods**.

Failed requests are not charged. You only pay for successful generations.

To see the exact cost of any request, include the `includeCost` parameter in your API call. The response will contain a `cost` field showing the amount in USD deducted for that specific task.

> [!NOTE]
> To see the pricing for a specific model, check its page in the [Models](https://runware.ai/models) section. You can also generate a test request in the [Playground](https://runware.ai/playground) to see the exact cost before integrating.