MODEL IDgoogle-gemma-4-31b
live

Gemma 4 31B

Google
by Google131K context

Gemma 4 31B is Google's flagship dense open-weights model in the Gemma 4 family. It combines strong reasoning, coding performance, native function calling, multimodal understanding across text, image, and video, and a 256K context window in a 31B-parameter open model designed for local and cloud deployment.

Gemma 4 31B

API Options

Platform-level options for task execution and delivery.

taskType

stringrequiredvalue: textInference

Identifier for the type of task being performed

taskUUID

stringrequiredUUID v4

UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.

outputFormat

stringdefault: TEXT

Specifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.

    Allowed values1 value

    webhookURL

    stringURI

    Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.

    Learn more1 resource

    deliveryMethod

    stringdefault: sync

    Determines how the API delivers task results.

    Allowed values3 values
    Returns complete results directly in the API response.
    Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.
    Streams results token-by-token as they are generated.
    Learn more1 resource

    includeCost

    booleandefault: false

    Include task cost in the response.

    includeUsage

    booleandefault: false

    Include token usage statistics in the response.

    numberResults

    integermin: 1max: 4default: 1

    Number of results to generate. Each result uses a different seed, producing variations of the same parameters.

    Inputs

    Input resources for the task (images, audio, etc). These must be nested inside the inputs object.

    inputs » images

    images

    array of stringsmin items: 1

    Array of image inputs (UUID, URL, Data URI, or Base64).

    inputs » videos

    videos

    array of stringsmin items: 1

    Array of video inputs (UUID, URL, or Base64).

    Core Parameters

    Primary parameters that define the task output.

    model

    stringrequiredvalue: google-gemma-4-31b

    Identifier of the model to use for generation.

    Learn more3 resources

    seed

    integermin: 0max: 9223372036854776000

    Random seed for reproducible generation. When not provided, a random seed is generated in the unsigned 32-bit range.

    Learn more1 resource

    messages

    array of objectsrequiredmin items: 1

    Array of chat messages forming the conversation context. The final message must use the user role.

    Properties2 properties
    messages » role

    role

    stringrequired

    The role of the message author.

    Allowed values2 values
    messages » content

    content

    stringrequiredmin: 1

    The text content of the message.

    Settings

    Technical parameters to fine-tune the inference process. These must be nested inside the settings object.

    settings » systemPrompt

    systemPrompt

    stringmin: 1max: 50000

    System-level instruction that guides the model's behavior and output style across the entire generation.

    settings » temperature

    temperature

    floatmin: 0max: 2step: 0.01

    Controls randomness in generation. Lower values produce more deterministic outputs, higher values increase variation and creativity.

    settings » topP

    topP

    floatmin: 0max: 1step: 0.01

    Nucleus sampling parameter that controls diversity by limiting the probability mass. Lower values make outputs more focused, higher values increase diversity.

    settings » frequencyPenalty

    frequencyPenalty

    floatmin: 0max: 2step: 0.01default: 0

    Penalizes tokens based on their frequency in the output so far. A value of 0.0 disables the penalty.

    settings » maxTokens

    maxTokens

    integermin: 1

    Maximum number of tokens to generate in the response.

    settings » minP

    minP

    floatmin: 0max: 1step: 0.01default: 0

    Minimum probability threshold. Tokens with probability below this value are excluded from sampling.

    settings » presencePenalty

    presencePenalty

    floatmin: -2max: 2step: 0.01default: 0

    Encourages the model to introduce new topics. A value of 0.0 disables the penalty.

    settings » repetitionPenalty

    repetitionPenalty

    floatmin: 0max: 2step: 0.01default: 1

    Penalizes tokens that have already appeared in the output. A value of 1.0 disables the penalty.

    settings » stopSequences

    stopSequences

    array of stringsmin: 1

    Array of sequences that will cause the model to stop generating further tokens when encountered.

    settings » thinkingLevel

    thinkingLevel

    stringdefault: high

    Controls the depth of internal reasoning the model performs before generating a response.

    Allowed values2 values
    settings » topK

    topK

    integermin: 1max: 100

    Top-K sampling parameter that limits the number of highest-probability tokens considered at each step.