MODEL ID zai-glm-4-7

live

GLM-4.7

by Z.ai December 22, 2025

GLM-4.7 is a 358 billion parameter Mixture-of-Experts language model from Z.ai optimized for agentic coding, complex reasoning, and long-horizon tasks. It features interleaved thinking, preserved thinking for multi-turn consistency, and turn-level thinking control. It supports a 200K token context window with 128K max output, tool calling, and achieves 73.8% on SWE-bench Verified.

API Options

Platform-level options for task execution and delivery.

taskType string required value: textInference: Identifier for the type of task being performed

taskUUID string required UUID v4: UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.

outputFormat string default: TEXT

Specifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.

Allowed values 1 value

webhookURL string URI

Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.

Learn more 1 resource

Webhooks
PLATFORM

deliveryMethod string default: sync

Determines how the API delivers task results.

Allowed values 3 values

: Returns complete results directly in the API response.
: Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.
: Streams results token-by-token as they are generated.

Learn more 1 resource

Task Polling
PLATFORM

includeCost boolean default: false: Include task cost in the response.

includeUsage boolean default: false: Include token usage statistics in the response.

numberResults integer min: 1 max: 4 default: 1: Number of results to generate. Each result uses a different seed, producing variations of the same parameters.

Core Parameters

Primary parameters that define the task output.

model string required value: zai-glm-4-7: Identifier of the model to use for generation.

seed integer min: 0 max: 4294967295: Random seed for reproducible generation. When not provided, a random seed is generated in the unsigned 32-bit range.

messages array of objects required min items: 1

Array of chat messages forming the conversation context.

Properties 2 properties

messages » role role string required: The role of the message author.

Allowed values 2 values

messages » content content string required min: 1: The text content of the message.

Settings

Technical parameters to fine-tune the inference process. These must be nested inside the settings object.

settings » systemPrompt systemPrompt string min: 1 max: 200000: System-level instruction that guides the model's behavior and output style across the entire generation.

settings » temperature temperature float min: 0 max: 2 step: 0.01 default: 1: Controls randomness in generation. Lower values produce more deterministic outputs, higher values increase variation and creativity.

settings » topP topP float min: 0 max: 1 step: 0.01 default: 0.95: Nucleus sampling parameter that controls diversity by limiting the probability mass. Lower values make outputs more focused, higher values increase diversity.

settings » frequencyPenalty frequencyPenalty float min: 0 max: 2 step: 0.01 default: 0: Penalizes tokens based on their frequency in the output so far. A value of 0.0 disables the penalty.

settings » maxTokens maxTokens integer min: 1 max: 131072 default: 32768: Maximum number of tokens to generate in the response.

settings » minP minP float min: 0 max: 1 step: 0.01 default: 0: Minimum probability threshold. Tokens with probability below this value are excluded from sampling.

settings » presencePenalty presencePenalty float min: 0 max: 2 step: 0.01 default: 0: Encourages the model to introduce new topics. A value of 0.0 disables the penalty.

settings » repetitionPenalty repetitionPenalty float min: 0 max: 2 step: 0.01 default: 1: Penalizes tokens that have already appeared in the output. A value of 1.0 disables the penalty.

settings » stopSequences stopSequences array of strings min: 1: Array of sequences that will cause the model to stop generating further tokens when encountered.

settings » thinkingLevel thinkingLevel string default: none: Controls the depth of internal reasoning the model performs before generating a response.

Allowed values 4 values

settings » topK topK integer min: 1 default: -1: Top-K sampling parameter that limits the number of highest-probability tokens considered at each step.

GLM-4.7

API Options

taskType

taskUUID

outputFormat

webhookURL

deliveryMethod

includeCost

includeUsage

numberResults

Core Parameters

model

seed

messages

role

content

Settings

systemPrompt

temperature

topP

frequencyPenalty

maxTokens

minP

presencePenalty

repetitionPenalty

stopSequences

thinkingLevel

topK

taskType

taskUUID

text

cost

finishReason

usage

promptTokens

completionTokens

totalTokens

thinkingTokens