Streaming (SSE)
Stream text inference results token-by-token using Server-Sent Events. Learn how to enable streaming and parse the SSE response format.
Overview
Streaming lets you receive text inference responses token-by-token as they are generated, rather than waiting for the entire response to complete. Results are delivered via Server-Sent Events (SSE), a lightweight HTTP-based protocol designed for real-time, server-to-client data delivery.
This is particularly useful for chat interfaces and any application where perceived latency matters. Instead of a multi-second wait followed by a wall of text, your users see the response appear progressively.
Streaming is available for all text inference models and works alongside the existing sync and async delivery methods.
Enabling streaming
Add "deliveryMethod": "stream" to your request. Everything else stays the same:
[
{
"taskType": "textInference",
"taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
"model": "minimax:m2.7@0",
"deliveryMethod": "stream",
"messages": [
{ "role": "user", "content": "Hello" }
],
"settings": {
"maxTokens": 4096,
"temperature": 1.0
}
}
]The response will be an SSE stream instead of a single JSON object. Each event contains a chunk of the generated text that you can display immediately.
Delivery methods compared
The deliveryMethod parameter controls how results are returned. Streaming is one of three options available for text inference tasks:
| Value | Behavior | Best for |
|---|---|---|
sync | Waits for the full response, returns it as a single JSON object. This is the default. | Simple integrations, short responses |
stream | Streams tokens as SSE events as they are generated. | Chat UIs, long-form generation |
async | Returns immediately with a task acknowledgment. Poll for results using Task Polling . | Background processing, long-running tasks |
SSE response format
The response is a standard SSE stream. Each event is a line prefixed with data:, followed by a JSON object, and terminated by a blank line. The stream ends with a data: [DONE] sentinel. The server may send : ping comments as keepalives, which should be ignored.
: ping
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":"Hello"},"finishReason":null}
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":" there"},"finishReason":null}
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{},"finishReason":"stop"}
data: [DONE]Parsing rules
Follow these steps to parse the SSE stream:
- Skip blank lines and comment lines (lines starting with
:). - Strip the
data:prefix from each event line. - Stop when you see
data: [DONE], which signals the end of the stream. - Parse each remaining line as JSON.
- Read the text from
delta.text. - Check for errors by looking for an
errorsarray in the parsed object.
Content chunks
During generation, each event contains a small piece of the response text in delta.text. Concatenate these chunks to build the full response:
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":"The"},"finishReason":null}
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":" answer"},"finishReason":null}
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":" is"},"finishReason":null}
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","delta":{"text":" 42."},"finishReason":null}Reasoning chunks
Some models perform internal reasoning before generating the final response. For these models, reasoning tokens arrive first in delta.reasoningContent, followed by the actual response in delta.text.
// Reasoning chunks - delta.reasoningContent
data: {"taskUUID":"6e879837-4b2a-4c1d-ae5f-8f3c21b07a92","taskType":"textInference","delta":{"reasoningContent":"The user asks: \"What is 2+2? Be brief.\" They want a short answer. It's a simple arithmetic: 4. Provide"}}
data: {"taskUUID":"6e879837-4b2a-4c1d-ae5f-8f3c21b07a92","taskType":"textInference","delta":{"reasoningContent":" short answer."}}
// Actual response - switches to delta.text
data: {"taskUUID":"6e879837-4b2a-4c1d-ae5f-8f3c21b07a92","taskType":"textInference","delta":{"text":"4"},"finishReason":null}You can display reasoning content in a collapsible section or debug panel, while streaming the final response directly to the user.
Multiple results
When you set numberResults greater than 1, multiple completions stream on the same connection. Each chunk includes a resultIndex field so you can tell which result it belongs to, since all results share the same taskUUID:
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","resultIndex":0,"delta":{"text":"Paris"},"finishReason":null}
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","resultIndex":1,"delta":{"text":"The capital"},"finishReason":null}
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","resultIndex":0,"delta":{},"finishReason":"stop"}
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","resultIndex":1,"delta":{"text":" is Paris."},"finishReason":null}
data: {"taskUUID":"a770f077-f413-47de-9dac-be0b26a35da6","taskType":"textInference","resultIndex":1,"delta":{},"finishReason":"stop"}
data: [DONE]Group chunks by resultIndex to reconstruct each result independently. Results may finish at different times.
Final chunk and finish reason
The last content-bearing event includes a finishReason value that tells you why the model stopped generating:
| Finish reason | Meaning |
|---|---|
stop | The model completed its response naturally. |
length | The response hit the maxTokens limit. |
content_filter | Content was filtered by the safety system. |
tool_calls | The model is requesting a tool call. |
tool_use | The model is requesting a tool use. |
unknown | The model stopped for an unrecognized reason. |
data: {"taskUUID":"6e879837-4b2a-4c1d-ae5f-8f3c21b07a92","taskType":"textInference","delta":{},"finishReason":"stop"}
data: [DONE]Cost and usage
Cost and token usage are reported in the final chunk of the stream, but only when explicitly requested:
- Set
includeCost: trueto receive thecostfield with the total price of the request in USD. Useful for tracking spend and billing. - Set
includeUsage: trueto receive theusageobject with detailed token counts and processing metadata. Useful for monitoring context window usage and optimizing prompts.
[
{
"taskType": "textInference",
"taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
"model": "minimax:m2.7@0",
"deliveryMethod": "stream",
"messages": [
{ "role": "user", "content": "What is 2+2? Be brief." }
],
"settings": {
"maxTokens": 4096,
"temperature": 1.0
},
"includeCost": true,
"includeUsage": true
}
]The final chunk before [DONE] will include both fields:
data: {"taskUUID":"6e879837-4b2a-4c1d-ae5f-8f3c21b07a92","taskType":"textInference","delta":{},"finishReason":"stop","usage":{"promptTokens":51,"completionTokens":38,"totalTokens":89},"cost":0.000061}
data: [DONE]Error handling
If an error occurs during streaming, the event will contain an errors array instead of a delta object.
{
"errors": [
{
"code": "timeoutProvider",
"message": "The provider timed out while generating the response.",
"taskType": "textInference",
"taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6"
}
]
}Check for the presence of errors in your parsing logic and handle them accordingly. Error fields follow the same structure as standard API errors .
Code examples
# The -N flag disables curl's output buffering so chunks print as they arrive.
curl -N -X POST https://api.runware.ai/v1 \
-H "Authorization: Bearer $RUNWARE_API_KEY" \
-H "Content-Type: application/json" \
-d '[
{
"taskType": "textInference",
"taskUUID": "550e8400-e29b-41d4-a716-446655440000",
"model": "minimax:m2.7@0",
"deliveryMethod": "stream",
"messages": [{"role": "user", "content": "Tell me a joke"}],
"settings": {"maxTokens": 512, "temperature": 1.0},
"includeCost": true
}
]'const response = await fetch('https://api.runware.ai/v1', {
method: 'POST',
headers: {
'Authorization': 'Bearer ' + RUNWARE_API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify([{
taskType: 'textInference',
taskUUID: crypto.randomUUID(),
model: 'minimax:m2.7@0',
deliveryMethod: 'stream',
messages: [{ role: 'user', content: 'Tell me a joke' }],
settings: { maxTokens: 512, temperature: 1.0 },
}]),
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
buffer = lines.pop();
for (const line of lines) {
if (!line.trim() || line.startsWith(':')) continue;
if (line === 'data: [DONE]') return;
const json = JSON.parse(line.replace('data: ', ''));
if (json.errors) {
console.error(json.errors[0].message);
return;
}
const text = json.delta?.text;
if (text) process.stdout.write(text);
}
}import json
import uuid
import httpx
response = httpx.post(
'https://api.runware.ai/v1',
headers={
'Authorization': f'Bearer {RUNWARE_API_KEY}',
'Content-Type': 'application/json',
},
json=[{
'taskType': 'textInference',
'taskUUID': str(uuid.uuid4()),
'model': 'minimax:m2.7@0',
'deliveryMethod': 'stream',
'messages': [{'role': 'user', 'content': 'Tell me a joke'}],
'settings': {'maxTokens': 512, 'temperature': 1.0},
}],
timeout=None,
)
for line in response.iter_lines():
if not line or line.startswith(':'):
continue
if line == 'data: [DONE]':
break
data = json.loads(line.removeprefix('data: '))
if 'errors' in data:
raise Exception(data['errors'][0]['message'])
text = data.get('delta', {}).get('text', '')
if text:
print(text, end='', flush=True)Best practices
- Buffer by line, not by byte. Network chunks may split a JSON event across multiple reads. Accumulate data in a buffer and process complete lines only.
- Handle
[DONE]explicitly. Always check for thedata: [DONE]sentinel before attempting to parse JSON. Treating it as JSON will cause a parse error. - Separate reasoning from content. If you're working with reasoning models, track whether the stream is currently delivering
delta.reasoningContentordelta.textand route them accordingly. - Implement timeouts. Set a reasonable timeout for the overall stream connection. If no events arrive within your timeout window, close the connection and retry.
- Use the Fetch API for browser clients. The browser's native
EventSourceAPI only supports GET requests. Since text inference uses POST, use the Fetch API withReadableStreaminstead.