MODEL ID google-gemma-4-31b

api-only

Gemma 4 31B

by Google April 2, 2026

Gemma 4 31B is Google's flagship dense open-weights model in the Gemma 4 family. It combines strong reasoning, coding performance, native function calling, multimodal understanding across text, image, and video, and a 256K context window in a 31B-parameter open model designed for local and cloud deployment.

API Options

Platform-level options for task execution and delivery.

taskType string required value: textInference: Identifier for the type of task being performed

taskUUID string required UUID v4: UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.

outputFormat string default: TEXT

Specifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.

Allowed values 1 value

webhookURL string URI

Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.

Learn more 1 resource

Webhooks
PLATFORM

deliveryMethod string default: sync

Determines how the API delivers task results.

Allowed values 3 values

: Returns complete results directly in the API response.
: Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.
: Streams results token-by-token as they are generated.

Learn more 1 resource

Task Polling
PLATFORM

includeCost boolean default: false: Include task cost in the response.

includeUsage boolean default: false: Include token usage statistics in the response.

numberResults integer min: 1 max: 4 default: 1: Number of results to generate. Each result uses a different seed, producing variations of the same parameters.

Inputs

Input resources for the task (images, audio, etc). These must be nested inside the inputs object.

inputs » images images array of strings min items: 1: Array of image inputs (UUID, URL, Data URI, or Base64).

inputs » videos videos array of strings min items: 1: Array of video inputs (UUID, URL, or Base64).

Core Parameters

Primary parameters that define the task output.

model string required value: google-gemma-4-31b: Identifier of the model to use for generation.

seed integer min: 0 max: 9223372036854776000: Random seed for reproducible generation. When not provided, a random seed is generated in the unsigned 32-bit range.

messages array of objects required min items: 1

Array of chat messages forming the conversation context.

Properties 2 properties

messages » role role string required: The role of the message author.

Allowed values 2 values

messages » content content string required min: 1: The text content of the message.

Settings

Technical parameters to fine-tune the inference process. These must be nested inside the settings object.

settings » systemPrompt systemPrompt string min: 1 max: 50000: System-level instruction that guides the model's behavior and output style across the entire generation.

settings » temperature temperature float min: 0 max: 2 step: 0.01: Controls randomness in generation. Lower values produce more deterministic outputs, higher values increase variation and creativity.

settings » topP topP float min: 0 max: 1 step: 0.01: Nucleus sampling parameter that controls diversity by limiting the probability mass. Lower values make outputs more focused, higher values increase diversity.

settings » frequencyPenalty frequencyPenalty float min: 0 max: 2 step: 0.01 default: 0: Penalizes tokens based on their frequency in the output so far. A value of 0.0 disables the penalty.

settings » maxTokens maxTokens integer min: 1: Maximum number of tokens to generate in the response.

settings » minP minP float min: 0 max: 1 step: 0.01 default: 0: Minimum probability threshold. Tokens with probability below this value are excluded from sampling.

settings » presencePenalty presencePenalty float min: -2 max: 2 step: 0.01 default: 0: Encourages the model to introduce new topics. A value of 0.0 disables the penalty.

settings » repetitionPenalty repetitionPenalty float min: 0 max: 2 step: 0.01 default: 1: Penalizes tokens that have already appeared in the output. A value of 1.0 disables the penalty.

settings » stopSequences stopSequences array of strings min: 1: Array of sequences that will cause the model to stop generating further tokens when encountered.

settings » thinkingLevel thinkingLevel string default: high: Controls the depth of internal reasoning the model performs before generating a response.

Allowed values 2 values

settings » topK topK integer min: 1 max: 100: Top-K sampling parameter that limits the number of highest-probability tokens considered at each step.