---
title: Streaming (SSE) | Runware Docs
url: https://runware.ai/docs/platform/streaming
description: Stream text inference results token-by-token using Server-Sent Events. Learn how to enable streaming and parse the SSE response format.
relatedDocuments:
  - https://runware.ai/docs/platform/webhooks
  - https://runware.ai/docs/platform/task-polling
  - https://runware.ai/docs/platform/openai
---
## Overview

Streaming lets you receive text inference responses **token-by-token as they are generated**, rather than waiting for the entire response to complete. Results are delivered via [Server-Sent Events](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events) (SSE), a lightweight HTTP-based protocol designed for real-time, server-to-client data delivery.

This is particularly useful for chat interfaces and any application where **perceived latency matters**. Instead of a multi-second wait followed by a wall of text, your users see the response appear progressively.

Streaming is available for all text inference models and works alongside the existing `sync` and `async` delivery methods.

## Enabling streaming

Add `"deliveryMethod": "stream"` to your request. Everything else stays the same:

```json
[
  {
    "taskType": "textInference",
    "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
    "model": "minimax:m2.7@0",
    "deliveryMethod": "stream",
    "messages": [
      { "role": "user", "content": "Hello" }
    ],
    "settings": {
      "maxTokens": 4096,
      "temperature": 1.0
    }
  }
]
```

The response will be an SSE stream instead of a single JSON object. Each event contains a chunk of the generated text that you can display immediately.

## Delivery methods compared

The `deliveryMethod` parameter controls how results are returned. Streaming is one of three options available for text inference tasks:

| Value | Behavior | Best for |
| --- | --- | --- |
| `sync` | Waits for the full response, returns it as a single JSON object. This is the default. | Simple integrations, short responses |
| `stream` | Streams tokens as SSE events as they are generated. | Chat UIs, long-form generation |
| `async` | Returns immediately with a task acknowledgment. Poll for results using [Task Polling](https://runware.ai/platform/task-polling). | Background processing, long-running tasks |

## SSE response format

The response is a standard SSE stream. Each event is a line prefixed with `data:`, followed by a JSON object, and terminated by a blank line. The stream ends with a `data: [DONE]` sentinel. The server may send `: ping` comments as keepalives, which should be ignored.

```text
: ping

data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":"Hello"},"finishReason":null}

data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":" there"},"finishReason":null}

data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{},"finishReason":"stop"}

data: [DONE]
```

### Parsing rules

Follow these steps to parse the SSE stream:

1. **Skip blank lines** and comment lines (lines starting with `:`).
2. **Strip the `data:` prefix** from each event line.
3. **Stop when you see `data: [DONE]`**, which signals the end of the stream.
4. **Parse each remaining line as JSON**.
5. **Read the text** from `delta.text`.
6. **Check for errors** by looking for an `errors` array in the parsed object.

### Content chunks

During generation, each event contains a small piece of the response text in `delta.text`. Concatenate these chunks to build the full response:

```text
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":"The"},"finishReason":null}

data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":" answer"},"finishReason":null}

data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":" is"},"finishReason":null}

data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":" 42."},"finishReason":null}
```

### Reasoning chunks

Some models perform internal reasoning before generating the final response. For these models, reasoning tokens arrive **first** in `delta.reasoningContent`, followed by the actual response in `delta.text`.

```text
// Reasoning chunks - delta.reasoningContent
data: {"taskUUID":"6e879837-4b2a-4c1d-ae5f-8f3c21b07a92","taskType":"textInference","delta":{"reasoningContent":"The user asks: \"What is 2+2? Be brief.\" They want a short answer. It's a simple arithmetic: 4. Provide"}}

data: {"taskUUID":"6e879837-4b2a-4c1d-ae5f-8f3c21b07a92","taskType":"textInference","delta":{"reasoningContent":" short answer."}}

// Actual response - switches to delta.text
data: {"taskUUID":"6e879837-4b2a-4c1d-ae5f-8f3c21b07a92","taskType":"textInference","delta":{"text":"4"},"finishReason":null}
```

You can display reasoning content in a collapsible section or debug panel, while streaming the final response directly to the user.

### Multiple results

When you set `numberResults` greater than 1, multiple completions stream on the same connection. Each chunk includes a `resultIndex` field so you can tell which result it belongs to, since all results share the same `taskUUID`:

```text
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","resultIndex":0,"delta":{"text":"Paris"},"finishReason":null}

data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","resultIndex":1,"delta":{"text":"The capital"},"finishReason":null}

data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","resultIndex":0,"delta":{},"finishReason":"stop"}

data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","resultIndex":1,"delta":{"text":" is Paris."},"finishReason":null}

data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","resultIndex":1,"delta":{},"finishReason":"stop"}

data: [DONE]
```

Group chunks by `resultIndex` to reconstruct each result independently. Results may finish at different times.

### Final chunk and finish reason

The last content-bearing event includes a `finishReason` value that tells you why the model stopped generating:

| Finish reason | Meaning |
| --- | --- |
| `stop` | The model completed its response naturally. |
| `length` | The response hit the `maxTokens` limit. |
| `content_filter` | Content was filtered by the safety system. |
| `tool_calls` | The model is requesting a tool call. |
| `tool_use` | The model is requesting a tool use. |
| `unknown` | The model stopped for an unrecognized reason. |

```text
data: {"taskUUID":"6e879837-4b2a-4c1d-ae5f-8f3c21b07a92","taskType":"textInference","delta":{},"finishReason":"stop"}

data: [DONE]
```

### Cost and usage

Cost and token usage are reported in the **final chunk** of the stream, but only when explicitly requested:

- Set **`includeCost: true`** to receive the `cost` field with the total price of the request in USD. Useful for tracking spend and billing.
- Set **`includeUsage: true`** to receive the `usage` object with detailed token counts and processing metadata. Useful for monitoring context window usage and optimizing prompts.

Request with cost and usage enabled

```json
[
  {
    "taskType": "textInference",
    "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
    "model": "minimax:m2.7@0",
    "deliveryMethod": "stream",
    "messages": [
      { "role": "user", "content": "What is 2+2? Be brief." }
    ],
    "settings": {
      "maxTokens": 4096,
      "temperature": 1.0
    },
    "includeCost": true,
    "includeUsage": true
  }
]
```

The final chunk before `[DONE]` will include both fields:

```text
data: {"taskUUID":"6e879837-4b2a-4c1d-ae5f-8f3c21b07a92","taskType":"textInference","delta":{},"finishReason":"stop","usage":{"promptTokens":51,"completionTokens":38,"totalTokens":89},"cost":0.000061}

data: [DONE]
```

### Error handling

If an error occurs during streaming, the event will contain an `errors` array instead of a `delta` object.

```json
{
  "errors": [
    {
      "code": "timeoutProvider",
      "message": "The provider timed out while generating the response.",
      "taskType": "textInference",
      "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6"
    }
  ]
}
```

Check for the presence of `errors` in your parsing logic and handle them accordingly. Error fields follow the same structure as [standard API errors](https://runware.ai/platform/errors).

## Code examples

**curl**:

```bash
# The -N flag disables curl's output buffering so chunks print as they arrive.

curl -N -X POST https://api.runware.ai/v1 \
  -H "Authorization: Bearer $RUNWARE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '[
    {
      "taskType": "textInference",
      "taskUUID": "550e8400-e29b-41d4-a716-446655440000",
      "model": "minimax:m2.7@0",
      "deliveryMethod": "stream",
      "messages": [{"role": "user", "content": "Tell me a joke"}],
      "settings": {"maxTokens": 512, "temperature": 1.0},
      "includeCost": true
    }
  ]'
```

**JavaScript**:

```javascript
const response = await fetch('https://api.runware.ai/v1', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer ' + RUNWARE_API_KEY,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify([{
    taskType: 'textInference',
    taskUUID: crypto.randomUUID(),
    model: 'minimax:m2.7@0',
    deliveryMethod: 'stream',
    messages: [{ role: 'user', content: 'Tell me a joke' }],
    settings: { maxTokens: 512, temperature: 1.0 },
  }]),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split('\n');
  buffer = lines.pop();

  for (const line of lines) {
    if (!line.trim() || line.startsWith(':')) continue;
    if (line === 'data: [DONE]') return;

    const json = JSON.parse(line.replace('data: ', ''));

    if (json.errors) {
      console.error(json.errors[0].message);
      return;
    }

    const text = json.delta?.text;
    if (text) process.stdout.write(text);
  }
}
```

**Python**:

```python
import json
import uuid
import httpx

response = httpx.post(
    'https://api.runware.ai/v1',
    headers={
        'Authorization': f'Bearer {RUNWARE_API_KEY}',
        'Content-Type': 'application/json',
    },
    json=[{
        'taskType': 'textInference',
        'taskUUID': str(uuid.uuid4()),
        'model': 'minimax:m2.7@0',
        'deliveryMethod': 'stream',
        'messages': [{'role': 'user', 'content': 'Tell me a joke'}],
        'settings': {'maxTokens': 512, 'temperature': 1.0},
    }],
    timeout=None,
)

for line in response.iter_lines():
    if not line or line.startswith(':'):
        continue
    if line == 'data: [DONE]':
        break

    data = json.loads(line.removeprefix('data: '))

    if 'errors' in data:
        raise Exception(data['errors'][0]['message'])

    text = data.get('delta', {}).get('text', '')
    if text:
        print(text, end='', flush=True)
```

## Best practices

- **Buffer by line, not by byte**. Network chunks may split a JSON event across multiple reads. Accumulate data in a buffer and process complete lines only.
- **Handle `[DONE]` explicitly**. Always check for the `data: [DONE]` sentinel before attempting to parse JSON. Treating it as JSON will cause a parse error.
- **Separate reasoning from content**. If you're working with reasoning models, track whether the stream is currently delivering `delta.reasoningContent` or `delta.text` and route them accordingly.
- **Implement timeouts**. Set a reasonable timeout for the overall stream connection. If no events arrive within your timeout window, close the connection and retry.
- **Use the Fetch API for browser clients**. The browser's native `EventSource` API only supports GET requests. Since text inference uses POST, use the Fetch API with `ReadableStream` instead.