
Z.ai
Multimodal AI models for image generation and video synthesis
Z.ai develops multimodal foundation models spanning image generation, video synthesis, and visual understanding. Its GLM-Image model combines autoregressive and diffusion architectures for high fidelity output with strong text rendering. As a Runware provider, Z.ai models are available through a single inference pipeline alongside other creators.
Models by Z.ai
GLM-5.1 is Z.ai’s flagship language model for agentic engineering, coding, reasoning, and tool-driven workflows. It supports a 200K token context window with up to 128K output tokens, deep thinking, function calling, structured output, and streaming tool calls, and is designed to stay effective over long multi-step sessions rather than only short-horizon tasks.
GLM-Image is an open-source image generation model that combines an autoregressive image-token generator with a diffusion decoder to produce high-fidelity results with strong prompt adherence. It is especially strong at accurate text rendering inside images and knowledge-intensive compositions, and it supports image-to-image generation for instruction-driven edits within a single unified model.
GLM-4.7 is a 358 billion parameter Mixture-of-Experts language model from Z.ai optimized for agentic coding, complex reasoning, and long-horizon tasks. It features interleaved thinking, preserved thinking for multi-turn consistency, and turn-level thinking control. It supports a 200K token context window with 128K max output, tool calling, and achieves 73.8% on SWE-bench Verified.


