MODEL ID moonshotai-kimi-k2-6
live

Kimi K2.6

Moonshot AI
by Moonshot AI

Kimi K2.6 is Moonshot AI's latest flagship open model for coding, reasoning, multimodal understanding, and agentic execution. It is designed for long-horizon software tasks, reliable tool use, autonomous multi-step workflows, coordinated agent swarms, and visual understanding across image and video inputs in addition to text.

Kimi K2.6

Building a tool-calling agent with Kimi K2.6

How to give Kimi K2.6 access to your own functions through Runware's OpenAI-compatible endpoint: defining tools, running the call loop, making parallel tool calls, and controlling tool selection.

Introduction

Tool calling lets Kimi K2.6 pause mid-response to call functions you define, read what they return, and finish its answer with that data in hand. You describe each function as a JSON schema, the model decides when to call it and with what arguments, and your code runs the function and passes the result back. Kimi K2.6 is built for this style of work: it handles multi-step, long-horizon tasks and chains several tool calls together before it commits to an answer.

The assistant below was given one instruction and four read/write tools (service health, logs, deploys, and a pager). From that single request, Kimi ran four tool calls across three rounds: it pulled health, deploys, and logs together, paged the on-call engineer based on what it found, then wrote up the incident.

Kimi's reply after 4 tool calls
**Status:** Checkout is degraded — error rate is **18.2%** and p95 latency is **4,200 ms**.

**Root cause:** Deploy **v2.4.1** went live at 13:30. Logs show a **NullPointerException in `CheckoutController.applyPromo`** starting at 13:42, tied directly to the new release.

**Action taken:** Paged the checkout on-call engineer and opened **INC-4827**.

**Recommended immediate next step:** Roll back checkout to **v2.4.0** to stop the bleeding while the team investigates the `applyPromo` bug.

The model never saw your services directly. It only saw the tools you described and the JSON they returned. This guide builds that assistant from a single call up to the full loop.

Connecting to the endpoint

Tool calling runs over Runware's OpenAI-compatible Chat Completions endpoint at https://api.runware.ai/v1. Point any OpenAI client at that base URL with your Runware API key, and set the model to moonshotai-kimi-k2-6. The request and response follow the standard OpenAI tool-calling format, so existing agent code and frameworks work unchanged.

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_RUNWARE_API_KEY",
    base_url="https://api.runware.ai/v1",
)
import OpenAI from 'openai'

const client = new OpenAI({
  apiKey: process.env.RUNWARE_API_KEY,
  baseURL: 'https://api.runware.ai/v1',
})

Anatomy of a tool call

A tool call is a four-message round-trip. You send the conversation plus a list of tools, the model replies with a tool call instead of text, you run the function and send the result back as a tool message, and the model answers. Walk through one call against the payments service:

{
  "model": "moonshotai-kimi-k2-6",
  "messages": [
    { "role": "system", "content": "You are an on-call site-reliability assistant. Before answering, use the provided tools to inspect live service health, recent logs, and recent deploys. Be concise and decisive, cite the numbers you found, and recommend one concrete next action." },
    { "role": "user", "content": "What is the current error rate on the payments service?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_service_health",
        "description": "Return the current health of a service: status, error rate, and p95 latency.",
        "parameters": {
          "type": "object",
          "properties": {
            "service": { "type": "string", "enum": ["checkout", "payments", "search"] }
          },
          "required": ["service"]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "temperature": 0.2,
  "max_completion_tokens": 1024
}
{
  "id": "chatcmpl-1fb9303b72d3939c40a33ecd",
  "object": "chat.completion",
  "model": "moonshotai-kimi-k2-6",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "functions.get_service_health:0",
            "type": "function",
            "function": {
              "name": "get_service_health",
              "arguments": "{\"service\":\"payments\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ],
  "usage": { "prompt_tokens": 144, "completion_tokens": 17, "total_tokens": 161 }
}
{
  "role": "tool",
  "tool_call_id": "functions.get_service_health:0",
  "content": "{\"status\":\"healthy\",\"healthy\":true,\"errorRate\":0.002,\"p95LatencyMs\":180}"
}
The payments service is currently **healthy** with an error rate of
**0.2%** (0.002) and a p95 latency of **180 ms**. No immediate action is required.

Three details carry the protocol. When finish_reason is tool_calls, the assistant message has no text to show the user yet: its content is null and the call lives in the tool_calls array. The arguments field is a JSON-encoded string, not an object, so you parse it before use. Your reply is a tool message whose tool_call_id must match the call's id, which is how the model pairs your result with the request it made. Append both the assistant message and your tool message to the history, send it back, and the model answers with finish_reason: "stop".

The agent loop

One call answers one question. Real tasks need a loop, because the model often calls more tools after seeing the first results. The rule is simple: send the conversation, append whatever comes back, and repeat until the model returns no more tool calls.

import json

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "Checkout users are reporting failed orders in the last few minutes. Find out what is going on and take the appropriate action."},
]

while True:
    response = client.chat.completions.create(
        model="moonshotai-kimi-k2-6",
        messages=messages,
        tools=tools,
    )
    message = response.choices[0].message
    messages.append(message)

    if not message.tool_calls:
        break

    for call in message.tool_calls:
        args = json.loads(call.function.arguments)
        result = TOOL_IMPLS[call.function.name](args)
        messages.append({
            "role": "tool",
            "tool_call_id": call.id,
            "content": json.dumps(result),
        })

print(messages[-1].content)
const messages = [
  { role: 'system', content: SYSTEM_PROMPT },
  { role: 'user', content: 'Checkout users are reporting failed orders in the last few minutes. Find out what is going on and take the appropriate action.' },
]

while (true) {
  const response = await client.chat.completions.create({
    model: 'moonshotai-kimi-k2-6',
    messages,
    tools,
  })
  const message = response.choices[0].message
  messages.push(message)

  if (!message.tool_calls) break

  for (const call of message.tool_calls) {
    const args = JSON.parse(call.function.arguments)
    const result = TOOL_IMPLS[call.function.name](args)
    messages.push({
      role: 'tool',
      tool_call_id: call.id,
      content: JSON.stringify(result),
    })
  }
}

console.log(messages.at(-1).content)

That loop is what produced the incident response at the top. It ran three rounds:

  • Round 1: From the single instruction, Kimi requested three tools at once: get_service_health, get_deploys, and search_logs, all for checkout. The loop ran them and appended three tool messages.
  • Round 2: With health showing degraded (18.2% errors), a v2.4.1 deploy 12 minutes earlier, and a log line tying the 500s to that release, Kimi made a follow-up call that depended on the first round's results: it paged the checkout team with a written summary.
  • Round 3: After the page returned an incident ID, the model stopped calling tools and wrote its answer.

Round 2 is the part a single call can't do. The model read three tool results, drew a conclusion, and chose a new action based on it:

Round 2: a tool call that depends on round 1's results
{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "functions.page_engineer:3",
            "type": "function",
            "function": {
              "name": "page_engineer",
              "arguments": "{\"team\":\"checkout\",\"summary\":\"Checkout degraded after v2.4.1 deploy: 18.2% error rate, NullPointerException in applyPromo\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Always bound the loop. A model can call tools for several rounds, and a bug in a tool (or a tool that keeps failing) can keep it looping. Cap the number of iterations and stop with a clear error when you hit the cap, rather than relying on the model to always finish on its own.

Parallel tool calls

When the model needs several independent pieces of information, it returns them as multiple entries in one tool_calls array rather than asking one round at a time. Asked for a health snapshot of three services, Kimi emitted all three calls in a single message:

One assistant message, three tool calls
{
  "message": {
    "role": "assistant",
    "content": null,
    "tool_calls": [
      { "id": "functions.get_service_health:0", "type": "function", "function": { "name": "get_service_health", "arguments": "{\"service\":\"checkout\"}" } },
      { "id": "functions.get_service_health:1", "type": "function", "function": { "name": "get_service_health", "arguments": "{\"service\":\"payments\"}" } },
      { "id": "functions.get_service_health:2", "type": "function", "function": { "name": "get_service_health", "arguments": "{\"service\":\"search\"}" } }
    ]
  },
  "finish_reason": "tool_calls"
}

The loop above already handles this: it iterates every entry in tool_calls and appends one tool message per call. Because each result is matched back by its own tool_call_id, you can run independent calls concurrently and append the results in any order. Once all three came back, the model answered in one line:

**Checkout** is degraded (18.2% error rate, 4.2s p95 latency); **payments** and **search** are healthy.
**Next action:** Investigate checkout immediately—roll back the most recent checkout deploy if one occurred within the latency/error-rate regression window.

Controlling tool selection

The tool_choice parameter decides how freely the model reaches for tools. The default, auto, is what every example above uses.

ValueBehavior
"auto" The model decides whether to call a tool. The default, and the right choice for most assistants.
"none" The model answers with text and calls no tools, even if some are defined.
"required" The model must call at least one tool before it can answer with text.
{ "type": "function", "function": { "name": "..." } } The model must call the named function.

Naming a function is useful when an action is already decided and you only need the model to fill in the arguments. Forcing page_engineer on a paging request guarantees the page fires:

{
  "model": "moonshotai-kimi-k2-6",
  "messages": [ ... ],
  "tools": [ ... ],
  "tool_choice": { "type": "function", "function": { "name": "page_engineer" } }
}

Naming a function guarantees Kimi calls it, but the model may still issue other tool calls it judges necessary in the same turn. In the paging run above, Kimi paged as instructed and also called get_service_health and get_deploys to fill in the incident summary. If you need exactly one call and nothing else, expose only that one tool on the request.

Designing tools Kimi can use well

The model only ever sees the schemas you hand it, so a tool's name and description are its entire interface. Name functions like API endpoints, where get_service_health and search_logs read as actions and handler or doStuff give the model nothing to reason about. Treat the description as instructions for choosing between tools mid-conversation, not as a changelog entry.

Constrain the arguments as tightly as the real inputs allow. An enum for a fixed set of values and required on the fields you depend on are the difference between a clean call and a malformed one. The service enum in these examples is why Kimi never invented a service name.

What a tool returns matters as much as what it accepts, because the result becomes the model's next input. Compact JSON objects with named fields like { "status": "degraded", "errorRate": 0.182 } are easier to reason over than prose or large blobs. Keep the overall set small, too: a handful of well-scoped tools beats a large catalog, since overlapping tools force the model to guess which one you meant.

Tips

  1. Echo every tool call back into the history. Append both the assistant message that requested the call and your tool reply. Dropping the assistant message breaks the tool_call_id pairing and the next request will error.

  2. Parse arguments defensively. It's a JSON string the model generated. Validate it against your schema before executing, and return an error object as the tool result if it doesn't fit, so the model can correct itself on the next round.

  3. Return errors as tool results, not exceptions. When a tool fails, send { "error": "..." } back as the tool message. The model can read that and adjust, where a thrown exception just ends the loop.

  4. Watch cached_tokens on long conversations. The system prompt and tool definitions repeat on every round, so Runware's prompt cache reuses them: across the incident's three rounds, usage.prompt_tokens_details.cached_tokens climbed from 0 to 608. Keeping the stable parts of your prompt in a fixed prefix maximizes that reuse.

  5. Set the action threshold in the system prompt. Whether the model gathers context before acting, or acts immediately, is shaped by instructions like "inspect health and logs before taking action." Tighten or loosen that line to match how much autonomy you want.