Gemma 4 31B

Gemma 4 31B is Google's flagship dense open-weights model in the Gemma 4 family. It combines strong reasoning, coding performance, native function calling, multimodal understanding across text, image, and video, and a 256K context window in a 31B-parameter open model designed for local and cloud deployment.

Complete technical specification for integration
Ready-to-use code snippets for common workflows
API Options
Platform-level options for task execution and delivery.
taskType
stringrequiredvalue: textInferenceIdentifier for the type of task being performed
taskUUID
stringrequiredUUID v4UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.
outputFormat
stringdefault: TEXTSpecifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.
Allowed values1 value
webhookURL
stringURISpecifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.
Learn more1 resource
- WebhooksPLATFORM
- Webhooks
deliveryMethod
stringdefault: syncDetermines how the API delivers task results.
Allowed values3 values
- Returns complete results directly in the API response.
- Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.
- Streams results token-by-token as they are generated.
Learn more1 resource
- Task PollingPLATFORM
includeCost
booleandefault: falseInclude task cost in the response.
includeUsage
booleandefault: falseInclude token usage statistics in the response.
numberResults
integermin: 1max: 4default: 1Number of results to generate. Each result uses a different seed, producing variations of the same parameters.
Inputs
Input resources for the task (images, audio, etc). These must be nested inside the inputs object.
inputs object.Core Parameters
Primary parameters that define the task output.
model
stringrequiredvalue: google-gemma-4-31bIdentifier of the model to use for generation.
Learn more3 resources
seed
integermin: 0max: 9223372036854776000Random seed for reproducible generation. When not provided, a random seed is generated in the unsigned 32-bit range.
Learn more1 resource
- SeedLEARN
- Seed
messages
array of objectsrequiredmin items: 1Array of chat messages forming the conversation context. The final message must use the user role.
Settings
Technical parameters to fine-tune the inference process. These must be nested inside the settings object.
settings object.settings»systemPromptsystemPrompt
stringmin: 1max: 50000System-level instruction that guides the model's behavior and output style across the entire generation.
settings»temperaturetemperature
floatmin: 0max: 2step: 0.01Controls randomness in generation. Lower values produce more deterministic outputs, higher values increase variation and creativity.
settings»topPtopP
floatmin: 0max: 1step: 0.01Nucleus sampling parameter that controls diversity by limiting the probability mass. Lower values make outputs more focused, higher values increase diversity.
settings»frequencyPenaltyfrequencyPenalty
floatmin: 0max: 2step: 0.01default: 0Penalizes tokens based on their frequency in the output so far. A value of 0.0 disables the penalty.
settings»maxTokensmaxTokens
integermin: 1Maximum number of tokens to generate in the response.
settings»minPminP
floatmin: 0max: 1step: 0.01default: 0Minimum probability threshold. Tokens with probability below this value are excluded from sampling.
settings»presencePenaltypresencePenalty
floatmin: -2max: 2step: 0.01default: 0Encourages the model to introduce new topics. A value of 0.0 disables the penalty.
settings»repetitionPenaltyrepetitionPenalty
floatmin: 0max: 2step: 0.01default: 1Penalizes tokens that have already appeared in the output. A value of 1.0 disables the penalty.
settings»stopSequencesstopSequences
array of stringsmin: 1Array of sequences that will cause the model to stop generating further tokens when encountered.
settings»thinkingLevelthinkingLevel
stringdefault: highControls the depth of internal reasoning the model performs before generating a response.
Allowed values2 values
settings»topKtopK
integermin: 1max: 100Top-K sampling parameter that limits the number of highest-probability tokens considered at each step.