Gemini 3.1 Pro
Gemini 3.1 Pro is Google’s flagship multimodal language model that processes text alongside images, audio, video, code, and documents. It offers high-performance reasoning, complex instruction following, and deep contextual understanding for a wide range of tasks across language, analysis, and problem solving.
API Options
Platform-level options for task execution and delivery.
-
taskType
string required value: textInference -
Identifier for the type of task being performed
-
taskUUID
string required UUID v4 -
UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.
-
webhookURL
string URI -
Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.
Learn more 1 resource
- Webhooks PLATFORM
- Webhooks
-
deliveryMethod
string default: sync -
Determines how the API delivers task results.
Allowed values 3 values
- Returns complete results directly in the API response.
- Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.
- Streams results token-by-token as they are generated.
Learn more 1 resource
- Task Polling PLATFORM
-
includeCost
boolean default: false -
Include task cost in the response.
-
includeUsage
boolean default: false -
Include token usage statistics in the response.
-
numberResults
integer min: 1 max: 4 default: 1 -
Number of results to generate. Each result uses a different seed, producing variations of the same parameters.
Inputs
Input resources for the task (images, audio, etc). These must be nested inside the inputs object.
inputs object.-
inputs»audiosaudios
array of strings min items: 1 -
Array of audio inputs (UUID, URL, or Base64).
-
inputs»documentsdocuments
array of strings min items: 1 -
Array of document inputs (UUID, URL, or Base64).
-
inputs»imagesimages
array of strings min items: 1 -
Array of image inputs (UUID, URL, Data URI, or Base64).
-
inputs»videosvideos
array of strings min items: 1 -
Array of video inputs (UUID, URL, or Base64).
Generation Parameters
Core parameters for controlling the generated content.
-
model
string required value: google:gemini@3.1-pro -
Identifier of the model to use for generation.
Learn more 3 resources
-
seed
integer min: 0 max: 4294967295 -
Random seed for reproducible generation. When not provided, a random seed is generated in the unsigned 32-bit range.
-
messages
array of objects required min items: 1 -
Array of chat messages forming the conversation context.
Settings
Technical parameters to fine-tune the inference process. These must be nested inside the settings object.
settings object.-
settings»systemPromptsystemPrompt
string min: 1 max: 200000 -
System-level instruction that guides the model's behavior and output style across the entire generation.
-
settings»temperaturetemperature
float min: 0 max: 2 step: 0.01 default: 1 -
Controls randomness in generation. Lower values produce more deterministic outputs, higher values increase variation and creativity.
-
settings»topPtopP
float min: 0 max: 1 step: 0.01 -
Nucleus sampling parameter that controls diversity by limiting the probability mass. Lower values make outputs more focused, higher values increase diversity.
-
settings»maxTokensmaxTokens
integer min: 1 max: 128000 default: 4096 -
Maximum number of tokens to generate in the response.
-
settings»stopSequencesstopSequences
array of strings min: 1 max: 50 max items: 5 -
Array of sequences that will cause the model to stop generating further tokens when encountered.
-
settings»thinkingLevelthinkingLevel
string default: high -
Controls the depth of internal reasoning the model performs before generating a response.
Allowed values 3 values
Storm-Season Harbor Evacuation Brief
Reasoning
**Defining the Scope**
I've got the overall structure down, but I'm refining the "Inferences" section, as that's where my expertise will really shine. I am also working to ensure I don't accidentally repeat any observation in the final summary. Focusing on extracting the *most* impactful takeaways. Here is the emergency-response brief based on the provided video:
### Overall Situation Summary
Civilians and pets are boarding a yellow transport vehicle on a dock during severe weather and turbulent sea conditions, while personnel attempt to manage the boarding process length {
"taskType": "textInference",
"taskUUID": "378ac79a-0546-4a6f-a20e-761d22ecfbd2",
"model": "google:gemini@3.1-pro",
"seed": 28829,
"settings": {
"systemPrompt": "You are an expert multimodal incident analyst. Produce clear, structured, evidence-grounded observations from video. Distinguish direct observations from inferences. Use concise headings and bullet points.",
"thinkingLevel": "high",
"temperature": 0.35,
"topP": 0.9,
"maxTokens": 1200
},
"inputs": {
"videos": [
"https://assets.runware.ai/assets/inputs/dcc5ba5c-2b5c-41ea-98a5-2047babe0c96.mp4"
]
},
"messages": [
{
"role": "user",
"content": "Analyze the attached video and write a detailed emergency-response brief. Include: 1) a 5-bullet chronological timeline of visible events, 2) the main hazards present, 3) the people and roles you can identify, 4) critical text/signage visible in the scene, 5) three immediate actions responders should prioritize, and 6) a one-sentence overall situation summary. Base your answer only on what is visually supported by the video, and clearly label any reasonable inference as an inference."
}
]
}{
"taskType": "textInference",
"taskUUID": "378ac79a-0546-4a6f-a20e-761d22ecfbd2",
"text": "Here is the emergency-response brief based on the provided video:\n\n### Overall Situation Summary\nCivilians and pets are boarding a yellow transport vehicle on a dock during severe weather and turbulent sea conditions, while personnel attempt to manage the boarding process",
"seed": 28829,
"cost": 0.01572,
"finishReason": "length",
"reasoningContent": [
"**Defining the Scope**\n\nI've got the overall structure down, but I'm refining the \"Inferences\" section, as that's where my expertise will really shine. I am also working to ensure I don't accidentally repeat any observation in the final summary. Focusing on extracting the *most* impactful takeaways.\n\n\n"
],
"thoughtSignature": "[REDACTED]"
}Storm-Battered Lighthouse Evidence Board
Reasoning
**Reviewing and Refining**
I'm checking that my observations are distinct from my inferences and that I've maintained the correct tone. The text's nonsensical quality has me noting it as potentially coded or degraded. The environmental details, like water droplets, are being emphasized to enhance the scene. I am ready to polish the final report now. ### FORENSIC EVIDENCE BOARD ANALYSIS REPORT
**CASE STATUS:** Active
**SUBJECT:** Coastal Disappearance / Unidentified Incident
---
### I. Inventory of Main Objects (Visible Evidence)
The following items are affixed to or resting upon a cork bulletin board:
**Photographic Evidence:**
* **Photo A (Top Left):** A dark, turbulent body of water, likely a sea or ocean, under heavy overcast skies.
* **Photo B (Top Center):** A tall, white lighthouse with a dark top, situated on a coastline at dusk or night. The light is illuminated.
* **Photo C (Top Right):** A dark, indistinct coastal structure, possibly a pier, breakwater, or bridge.
* **Photo D (Center Right, Large):** A detailed image of a sea cave or grotto. The water inside is a vibrant, luminescent blue, contrasting with the dark rock.
**Documentary Evidence:**
* **Missing Person Poster (Bottom Left):** Features a black-and-white portrait of a young adult male with short hair.
* **Handwritten Journal (Bottom Left):** An open notebook with length {
"taskType": "textInference",
"taskUUID": "ead5ba1a-9a2d-46e3-8ee2-373fb66251f6",
"model": "google:gemini@3.1-pro",
"seed": 40503,
"settings": {
"systemPrompt": "You are an expert visual analyst. Ground every conclusion in visible evidence, separate observation from inference, and organize the response with clear section headings.",
"thinkingLevel": "high",
"temperature": 0.52,
"maxTokens": 1800,
"topP": 0.9
},
"messages": [
{
"role": "user",
"content": "Analyze the provided image as if you are a forensic investigator reviewing an evidence board. Identify the main objects, transcribe any clearly legible text, infer the likely sequence of events, explain what the red-string connections suggest, and end with three concise investigative leads ranked by confidence."
}
],
"inputs": {
"images": [
"https://assets.runware.ai/assets/inputs/f664b3a6-7d95-455f-b7b9-f68bde91a6b4.jpg"
]
}
}{
"taskType": "textInference",
"taskUUID": "ead5ba1a-9a2d-46e3-8ee2-373fb66251f6",
"text": "### FORENSIC EVIDENCE BOARD ANALYSIS REPORT\n\n**CASE STATUS:** Active\n**SUBJECT:** Coastal Disappearance / Unidentified Incident\n\n---\n\n### I. Inventory of Main Objects (Visible Evidence)\nThe following items are affixed to or resting upon a cork bulletin board:\n\n**Photographic Evidence:**\n* **Photo A (Top Left):** A dark, turbulent body of water, likely a sea or ocean, under heavy overcast skies.\n* **Photo B (Top Center):** A tall, white lighthouse with a dark top, situated on a coastline at dusk or night. The light is illuminated.\n* **Photo C (Top Right):** A dark, indistinct coastal structure, possibly a pier, breakwater, or bridge.\n* **Photo D (Center Right, Large):** A detailed image of a sea cave or grotto. The water inside is a vibrant, luminescent blue, contrasting with the dark rock.\n\n**Documentary Evidence:**\n* **Missing Person Poster (Bottom Left):** Features a black-and-white portrait of a young adult male with short hair.\n* **Handwritten Journal (Bottom Left):** An open notebook with",
"seed": 40503,
"cost": 0.02386,
"finishReason": "length",
"reasoningContent": [
"**Reviewing and Refining**\n\nI'm checking that my observations are distinct from my inferences and that I've maintained the correct tone. The text's nonsensical quality has me noting it as potentially coded or degraded. The environmental details, like water droplets, are being emphasized to enhance the scene. I am ready to polish the final report now.\n\n\n"
],
"thoughtSignature": "[REDACTED]"
}Forensic Market Stall Reconstruction
Reasoning
**Outlining the Details**
I'm now fully immersed in the section on concrete evidence. I've broken down the environment, the stall, the incident, and the people involved into distinct logical groupings. My focus is on meticulous observation, and I've ensured I include at least five details that a casual viewer might overlook. My attention to detail is paramount. Here is the visual investigation based on the provided image.
### 1) Scene Summary
The image captures a sudden, disruptive moment in a narrow, rain-slicked alleyway at night, likely in an East Asian city. The focal point is a warmly lit, traditional street food stall where an accident has just occurred, resulting in scattered oranges and broken ceramics on the wet pavement. Two men frame the scene on the left and right, both frozen in exaggerated postures of shock or dismay. The mood is a striking contrast: the cozy, steamy warmth of the food stall and the calm presence of a cat clash with the cinematic tension, sudden kinetic energy of the spill, and the length {
"taskType": "textInference",
"taskUUID": "9be73003-bbc8-4ca9-8d15-0cce6abb2bb2",
"model": "google:gemini@3.1-pro",
"seed": 29225,
"settings": {
"temperature": 0.46,
"topP": 0.9,
"maxTokens": 1800,
"thinkingLevel": "high",
"systemPrompt": "You are an elite multimodal analyst. Be precise, grounded, and structured. Never claim certainty beyond what the image supports. Separate observation from inference."
},
"messages": [
{
"role": "user",
"content": "Analyze the provided image like a meticulous visual investigator. Return four sections with clear headings: 1) Scene Summary, 2) Concrete Evidence Observed, 3) Most Likely Recent Sequence of Events, 4) Uncertainties and Alternate Interpretations. Distinguish direct observations from inference, cite specific visual cues, mention any readable text, estimate the mood, and note at least five small details that a casual viewer might miss."
}
],
"inputs": {
"images": [
"https://assets.runware.ai/assets/inputs/c5c8c57a-3337-4cb9-803a-c14a92b0742c.jpg"
]
}
}{
"taskType": "textInference",
"taskUUID": "9be73003-bbc8-4ca9-8d15-0cce6abb2bb2",
"text": "Here is the visual investigation based on the provided image.\n\n### 1) Scene Summary\nThe image captures a sudden, disruptive moment in a narrow, rain-slicked alleyway at night, likely in an East Asian city. The focal point is a warmly lit, traditional street food stall where an accident has just occurred, resulting in scattered oranges and broken ceramics on the wet pavement. Two men frame the scene on the left and right, both frozen in exaggerated postures of shock or dismay. The mood is a striking contrast: the cozy, steamy warmth of the food stall and the calm presence of a cat clash with the cinematic tension, sudden kinetic energy of the spill, and the",
"seed": 29225,
"cost": 0.02394,
"finishReason": "length",
"reasoningContent": [
"**Outlining the Details**\n\nI'm now fully immersed in the section on concrete evidence. I've broken down the environment, the stall, the incident, and the people involved into distinct logical groupings. My focus is on meticulous observation, and I've ensured I include at least five details that a casual viewer might overlook. My attention to detail is paramount.\n\n\n"
],
"thoughtSignature": "[REDACTED]"
}Multimodal Incident Brief
Reasoning
**Constructing a Concise Brief**
I've been working on a structured incident brief. I've broken down the video into sections: scene overview, timeline, risks, observations, and recommended actions. My focus now is on tightening the language, aiming for clarity and professionalism, making the report as concise as possible while retaining all the key information. Here is a structured incident brief based on the provided video:
### Incident Brief
**1) Scene Overview**
The event takes place in a modern corporate conference room length {
"taskType": "textInference",
"taskUUID": "cca8b53f-1eaa-4bc9-bb77-ffdc5e133e1b",
"model": "google:gemini@3.1-pro",
"settings": {
"maxTokens": 900,
"temperature": 0.3,
"thinkingLevel": "high",
"systemPrompt": "You are an expert multimodal operations analyst. Extract salient details from video carefully, avoid speculation, distinguish observed facts from inferred risks, and present findings in a clear business-ready format."
},
"messages": [
{
"role": "user",
"content": "Analyze the uploaded video and produce a structured incident brief with these sections: 1) Scene overview, 2) Chronological event timeline, 3) Potential safety or operational risks, 4) Evidence-based observations only, 5) Recommended next actions. Keep the response concise, professional, and grounded only in what is visible in the video."
}
],
"inputs": {
"videos": [
"https://assets.runware.ai/assets/inputs/570c981b-3a95-4265-8371-1ff19306b9bc.mp4"
]
}
}{
"taskType": "textInference",
"taskUUID": "cca8b53f-1eaa-4bc9-bb77-ffdc5e133e1b",
"text": "Here is a structured incident brief based on the provided video:\n\n### Incident Brief\n\n**1) Scene Overview**\nThe event takes place in a modern corporate conference room",
"cost": 0.01311,
"finishReason": "length",
"reasoningContent": [
"**Constructing a Concise Brief**\n\nI've been working on a structured incident brief. I've broken down the video into sections: scene overview, timeline, risks, observations, and recommended actions. My focus now is on tightening the language, aiming for clarity and professionalism, making the report as concise as possible while retaining all the key information.\n\n\n"
],
"thoughtSignature": "[REDACTED]"
}