---
title: Building a tool-calling agent with Kimi K2.6 — Kimi K2.6 | Runware Docs
url: https://runware.ai/docs/models/moonshotai-kimi-k2-6/guides/tool-calling
description: "How to give Kimi K2.6 access to your own functions through Runware's OpenAI-compatible endpoint: defining tools, running the call loop, making parallel tool calls, and controlling tool selection."
---
### [Introduction](https://runware.ai/docs/models/moonshotai-kimi-k2-6/guides/tool-calling#introduction)

Tool calling lets Kimi K2.6 pause mid-response to call functions you define, read what they return, and finish its answer with that data in hand. You describe each function as a JSON schema, the model decides **when to call it and with what arguments**, and your code runs the function and passes the result back. Kimi K2.6 is built for this style of work: it handles **multi-step, long-horizon tasks** and chains several tool calls together before it commits to an answer.

The assistant below was given one instruction and four read/write tools (service health, logs, deploys, and a pager). From that single request, Kimi ran **four tool calls across three rounds**: it pulled health, deploys, and logs together, paged the on-call engineer based on what it found, then wrote up the incident.

Kimi's reply after 4 tool calls

```markdown
**Status:** Checkout is degraded — error rate is **18.2%** and p95 latency is **4,200 ms**.

**Root cause:** Deploy **v2.4.1** went live at 13:30. Logs show a **NullPointerException in `CheckoutController.applyPromo`** starting at 13:42, tied directly to the new release.

**Action taken:** Paged the checkout on-call engineer and opened **INC-4827**.

**Recommended immediate next step:** Roll back checkout to **v2.4.0** to stop the bleeding while the team investigates the `applyPromo` bug.
```

The model never saw your services directly. It only saw the tools you described and the JSON they returned. This guide builds that assistant from a single call up to the full loop.

### [Connecting to the endpoint](https://runware.ai/docs/models/moonshotai-kimi-k2-6/guides/tool-calling#connecting-to-the-endpoint)

Tool calling runs over Runware's [OpenAI-compatible Chat Completions endpoint](https://runware.ai/docs/platform/openai) at `https://api.runware.ai/v1`. Point any OpenAI client at that base URL with your Runware API key, and set the model to `moonshotai-kimi-k2-6`. The request and response follow the **standard OpenAI tool-calling format**, so existing agent code and frameworks work unchanged.

**Python**:

```python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_RUNWARE_API_KEY",
    base_url="https://api.runware.ai/v1",
)
```

**TypeScript**:

```typescript
import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: process.env.RUNWARE_API_KEY,
  baseURL: 'https://api.runware.ai/v1',
})
```

### [Anatomy of a tool call](https://runware.ai/docs/models/moonshotai-kimi-k2-6/guides/tool-calling#anatomy-of-a-tool-call)

A tool call is a **four-message round-trip**. You send the conversation plus a list of `tools`, the model replies with a tool call instead of text, you run the function and send the result back as a `tool` message, and the model answers. Walk through one call against the `payments` service:

**1. Request**:

```json
{
  "model": "moonshotai-kimi-k2-6",
  "messages": [
    { "role": "system", "content": "You are an on-call site-reliability assistant. Before answering, use the provided tools to inspect live service health, recent logs, and recent deploys. Be concise and decisive, cite the numbers you found, and recommend one concrete next action." },
    { "role": "user", "content": "What is the current error rate on the payments service?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_service_health",
        "description": "Return the current health of a service: status, error rate, and p95 latency.",
        "parameters": {
          "type": "object",
          "properties": {
            "service": { "type": "string", "enum": ["checkout", "payments", "search"] }
          },
          "required": ["service"]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "temperature": 0.2,
  "max_completion_tokens": 1024
}
```

**2. Response**:

```json
{
  "id": "chatcmpl-1fb9303b72d3939c40a33ecd",
  "object": "chat.completion",
  "model": "moonshotai-kimi-k2-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "functions.get_service_health:0",
            "type": "function",
            "function": {
              "name": "get_service_health",
              "arguments": "{\"service\":\"payments\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": { "prompt_tokens": 144, "completion_tokens": 17, "total_tokens": 161 }
}
```

**3. Tool result**:

```json
{
  "role": "tool",
  "tool_call_id": "functions.get_service_health:0",
  "content": "{\"status\":\"healthy\",\"healthy\":true,\"errorRate\":0.002,\"p95LatencyMs\":180}"
}
```

**4. Final answer**:

```markdown
The payments service is currently **healthy** with an error rate of
**0.2%** (0.002) and a p95 latency of **180 ms**. No immediate action is required.
```

Three details carry the protocol. When `finish_reason` is `tool_calls`, the assistant message has **no text to show the user yet**: its `content` is null and the call lives in the `tool_calls` array. The `arguments` field is a **JSON-encoded string, not an object**, so you parse it before use. Your reply is a `tool` message whose `tool_call_id` must match the call's `id`, which is how the model pairs your result with the request it made. Append both the assistant message and your tool message to the history, send it back, and the model answers with `finish_reason: "stop"`.

### [The agent loop](https://runware.ai/docs/models/moonshotai-kimi-k2-6/guides/tool-calling#the-agent-loop)

One call answers one question. Real tasks need a loop, because the model often calls more tools after seeing the first results. The rule is simple: send the conversation, append whatever comes back, and **repeat until the model returns no more tool calls**.

**Python**:

```python
import json

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "Checkout users are reporting failed orders in the last few minutes. Find out what is going on and take the appropriate action."},
]

while True:
    response = client.chat.completions.create(
        model="moonshotai-kimi-k2-6",
        messages=messages,
        tools=tools,
    )
    message = response.choices[0].message
    messages.append(message)

    if not message.tool_calls:
        break

    for call in message.tool_calls:
        args = json.loads(call.function.arguments)
        result = TOOL_IMPLS[call.function.name](args)
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": json.dumps(result),
        })

print(messages[-1].content)
```

**TypeScript**:

```typescript
const messages = [
  { role: 'system', content: SYSTEM_PROMPT },
  { role: 'user', content: 'Checkout users are reporting failed orders in the last few minutes. Find out what is going on and take the appropriate action.' },
]

while (true) {
  const response = await client.chat.completions.create({
    model: 'moonshotai-kimi-k2-6',
    messages,
    tools,
  })
  const message = response.choices[0].message
  messages.push(message)

  if (!message.tool_calls) break

  for (const call of message.tool_calls) {
    const args = JSON.parse(call.function.arguments)
    const result = TOOL_IMPLS[call.function.name](args)
    messages.push({
      role: 'tool',
      tool_call_id: call.id,
      content: JSON.stringify(result),
    })
  }
}

console.log(messages.at(-1).content)
```

That loop is what produced the incident response at the top. It ran three rounds:

- **Round 1:** From the single instruction, Kimi requested three tools at once: `get_service_health`, `get_deploys`, and `search_logs`, all for `checkout`. The loop ran them and appended three `tool` messages.
- **Round 2:** With health showing `degraded` (18.2% errors), a `v2.4.1` deploy 12 minutes earlier, and a log line tying the 500s to that release, Kimi made a **follow-up call that depended on the first round's results**: it paged the checkout team with a written summary.
- **Round 3:** After the page returned an incident ID, the model stopped calling tools and wrote its answer.

Round 2 is the part a single call can't do. The model read three tool results, drew a conclusion, and chose a new action based on it:

Round 2: a tool call that depends on round 1's results

```json
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "functions.page_engineer:3",
            "type": "function",
            "function": {
              "name": "page_engineer",
              "arguments": "{\"team\":\"checkout\",\"summary\":\"Checkout degraded after v2.4.1 deploy: 18.2% error rate, NullPointerException in applyPromo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}
```

> [!WARNING]
> Always bound the loop. A model can call tools for several rounds, and a bug in a tool (or a tool that keeps failing) can keep it looping. Cap the number of iterations and stop with a clear error when you hit the cap, rather than relying on the model to always finish on its own.

### [Parallel tool calls](https://runware.ai/docs/models/moonshotai-kimi-k2-6/guides/tool-calling#parallel-tool-calls)

When the model needs several independent pieces of information, it returns them as **multiple entries in one `tool_calls` array** rather than asking one round at a time. Asked for a health snapshot of three services, Kimi emitted all three calls in a single message:

One assistant message, three tool calls

```json
{
  "message": {
    "role": "assistant",
    "content": null,
    "tool_calls": [
      { "id": "functions.get_service_health:0", "type": "function", "function": { "name": "get_service_health", "arguments": "{\"service\":\"checkout\"}" } },
      { "id": "functions.get_service_health:1", "type": "function", "function": { "name": "get_service_health", "arguments": "{\"service\":\"payments\"}" } },
      { "id": "functions.get_service_health:2", "type": "function", "function": { "name": "get_service_health", "arguments": "{\"service\":\"search\"}" } }
    ]
  },
  "finish_reason": "tool_calls"
}
```

The loop above already handles this: it iterates every entry in `tool_calls` and appends one `tool` message per call. Because each result is matched back by its own `tool_call_id`, **you can run independent calls concurrently** and append the results in any order. Once all three came back, the model answered in one line:

```markdown
**Checkout** is degraded (18.2% error rate, 4.2s p95 latency); **payments** and **search** are healthy.
**Next action:** Investigate checkout immediately—roll back the most recent checkout deploy if one occurred within the latency/error-rate regression window.
```

### [Controlling tool selection](https://runware.ai/docs/models/moonshotai-kimi-k2-6/guides/tool-calling#controlling-tool-selection)

The `tool_choice` parameter decides how freely the model reaches for tools. The default, `auto`, is what every example above uses.

| Value | Behavior |
| --- | --- |
| `"auto"` | The model decides whether to call a tool. The default, and the right choice for most assistants. |
| `"none"` | The model answers with text and calls no tools, even if some are defined. |
| `"required"` | The model must call at least one tool before it can answer with text. |
| `{ "type": "function", "function": { "name": "..." } }` | The model must call the named function. |

Naming a function is useful when an action is already decided and you only need the model to **fill in the arguments**. Forcing `page_engineer` on a paging request guarantees the page fires:

```json
{
  "model": "moonshotai-kimi-k2-6",
  "messages": [ ... ],
  "tools": [ ... ],
  "tool_choice": { "type": "function", "function": { "name": "page_engineer" } }
}
```

> [!NOTE]
> Naming a function guarantees Kimi calls it, but the model may still issue other tool calls it judges necessary in the same turn. In the paging run above, Kimi paged as instructed and also called `get_service_health` and `get_deploys` to fill in the incident summary. If you need exactly one call and nothing else, expose only that one tool on the request.

### [Designing tools Kimi can use well](https://runware.ai/docs/models/moonshotai-kimi-k2-6/guides/tool-calling#designing-tools-kimi-can-use-well)

The model only ever sees the schemas you hand it, so a tool's **name and description are its entire interface**. Name functions like API endpoints, where `get_service_health` and `search_logs` read as actions and `handler` or `doStuff` give the model nothing to reason about. Treat the description as **instructions for choosing between tools** mid-conversation, not as a changelog entry.

Constrain the arguments as tightly as the real inputs allow. An `enum` for a fixed set of values and `required` on the fields you depend on are **the difference between a clean call and a malformed one**. The `service` enum in these examples is why Kimi never invented a service name.

What a tool returns matters as much as what it accepts, because the result becomes the model's next input. **Compact JSON objects with named fields** like `{ "status": "degraded", "errorRate": 0.182 }` are easier to reason over than prose or large blobs. **Keep the overall set small**, too: a handful of well-scoped tools beats a large catalog, since overlapping tools force the model to guess which one you meant.

### [Tips](https://runware.ai/docs/models/moonshotai-kimi-k2-6/guides/tool-calling#tips)

1. **Echo every tool call back into the history.** Append both the assistant message that requested the call and your `tool` reply. Dropping the assistant message breaks the `tool_call_id` pairing and the next request will error.
    
2. **Parse `arguments` defensively.** It's a JSON string the model generated. Validate it against your schema before executing, and return an error object as the tool result if it doesn't fit, so the model can correct itself on the next round.
    
3. **Return errors as tool results, not exceptions.** When a tool fails, send `{ "error": "..." }` back as the `tool` message. The model can read that and adjust, where a thrown exception just ends the loop.
    
4. **Watch `cached_tokens` on long conversations.** The system prompt and tool definitions repeat on every round, so Runware's prompt cache reuses them: across the incident's three rounds, `usage.prompt_tokens_details.cached_tokens` climbed from 0 to 608. Keeping the stable parts of your prompt in a fixed prefix maximizes that reuse.
    
5. **Set the action threshold in the system prompt.** Whether the model gathers context before acting, or acts immediately, is shaped by instructions like "inspect health and logs before taking action." Tighten or loosen that line to match how much autonomy you want.