xAI
xAI

Grok 4.3

Flagship multimodal LLM for agentic tool calling, long context, and low-hallucination reasoning

Text to TextImage to Text

Grok 4.3 Overview

Grok 4.3 is xAI's flagship language model for agentic reasoning, strong instruction following, and minimal hallucinations. It supports text and image input, a 1 million token context window, configurable reasoning effort including non-reasoning mode, function calling, and structured outputs for production assistants, coding workflows, and long-context analysis.

How to Use Grok 4.3

Overview

Grok 4.3 is xAI's flagship multimodal language model built for long-context reasoning, strong tool use, and low-hallucination instruction following.

It is a strong fit for applications that need a high-capability general-purpose model with image understanding, configurable reasoning depth, and reliable behavior in agentic workflows.

Strengths

Strong Agentic Tool Calling

Grok 4.3 is positioned around strong tool calling and agentic execution. It is built for workflows where the model must not only answer, but also coordinate actions, query systems, and operate inside multi-step application logic.

Low-Hallucination Behavior

xAI positions Grok 4.3 as a flagship model with a very low hallucination rate. This makes it a strong fit for professional workflows where factual reliability and instruction accuracy matter more than pure generative style.

Configurable Reasoning

The model supports configurable reasoning effort, including none, low, medium, and high. That gives it useful range across latency-sensitive requests and deeper reasoning-heavy tasks.

Very Large Context Window

Grok 4.3 supports a 1 million token context window. It is useful for long documents, large retrieval payloads, multi-file coding work, and analysis pipelines where a substantial working set must remain in scope.

Multimodal Understanding

The model supports both text and image input. It is useful for visual question answering, screenshot analysis, document understanding, and text-plus-image workflows where visual context needs to be incorporated into the response.

Structured Production Workflows

Grok 4.3 supports structured outputs and function calling, which makes it suitable for applications that need predictable orchestration, schema-constrained responses, and production-grade integration patterns.

Capabilities

Text-to-Text

Grok 4.3 handles general text generation, reasoning, summarization, planning, coding assistance, and structured response workflows.

Image-to-Text

The model can take image input and use visual context as part of a larger text-based reasoning task.

Function Calling

Grok 4.3 supports function calling and structured outputs, making it well suited to tool-using assistants and agent-style systems.

Input and Output

  • AIR ID: xai:[email protected]
  • Input: text and image input
  • Output: text
  • Context window: 1,000,000 tokens
  • Reasoning effort: none, low, medium, high
  • Structured outputs: supported
  • Function calling: supported

Best Fit

  • Tool-using assistants
  • Coding and debugging workflows
  • Long-context analysis
  • Image-grounded reasoning
  • Production systems that need structured outputs

More models from xAI

Grok Imagine Image Quality is xAI's quality-focused image generation and editing model. It is designed for higher realism, stronger multilingual text rendering, tighter prompt following, deeper scene understanding, and more consistent brand-oriented output across both text-to-image and image editing workflows.

xAI Text-to-Speech converts text into natural-sounding spoken audio with a single API call. It offers five expressive voices (Eve, Ara, Leo, Rex, and Sal), inline speech tags for fine-grained control over pauses, laughter, whispers, and emphasis, and supports over 20 auto-detected languages.

Grok Imagine Image Pro is the higher quality variant of the Grok Imagine image model developed by xAI. It generates detailed images from text prompts and supports iterative editing of existing images through natural language instructions. The model provides stronger prompt adherence, improved rendering quality, and more reliable control over composition, style, and aspect ratio. It supports multiple image styles and resolutions up to 2K, enabling workflows for design, illustration, and creative prototyping.

Grok Imagine Image is a multimodal generative image model that creates high-quality still images from text prompts or image inputs. It supports flexible visual synthesis across a range of styles, enabling developers to generate creative imagery directly from structured prompts or to expand on existing visuals with coherent, detailed outputs.

Grok Imagine Video is a multimodal generative video model that produces short video clips with native audio from text descriptions or static images. It supports text-to-video and image-to-video generation with synchronized sound effects and dialogue, enabling developers to animate scenes with motion, camera dynamics, and audio in a single API workflow.