Qwen2.5-VL-7B-Instruct
Qwen2.5-VL-7B-Instruct is a multimodal model that processes images and text together to perform visual reasoning, captioning, question answering, and structured output generation. It integrates a vision encoder with a 7B instruction-tuned language backbone to support rich interactive multimodal understanding.
API Options
Platform-level options for task execution and delivery.
-
taskType
string required value: caption -
Identifier for the type of task being performed
-
taskUUID
string required UUID v4 -
UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.
-
outputType
string default: URL -
Image output type.
Allowed values 3 values
-
outputFormat
string default: JPG -
Specifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.
- `JPG`: Best for photorealistic images with smaller file sizes (no transparency).
- `PNG`: Lossless compression, supports high quality and transparency (alpha channel).
- `WEBP`: Modern format providing superior compression and transparency support.
**Transparency**: If you are using features like background removal or LayerDiffuse that require transparency, you must select a format that supports an alpha channel (e.g., `PNG`, `WEBP`, `TIFF`). `JPG` does not support transparency.Allowed values 3 values
-
outputQuality
integer min: 20 max: 99 default: 95 -
Compression quality of the output. Higher values preserve quality but increase file size.
-
webhookURL
string URI -
Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.
Learn more 1 resource
- Webhooks PLATFORM
- Webhooks
-
deliveryMethod
string default: sync -
Determines how the API delivers task results.
Allowed values 2 values
- Returns complete results directly in the API response.
- Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.
Learn more 1 resource
- Task Polling PLATFORM
-
uploadEndpoint
string URI -
Specifies a URL where the generated content will be automatically uploaded using the HTTP PUT method. The raw binary data of the media file is sent directly as the request body. For secure uploads to cloud storage, use presigned URLs that include temporary authentication credentials.
Common use cases:
- Cloud storage: Upload directly to S3 buckets, Google Cloud Storage, or Azure Blob Storage using presigned URLs.
- CDN integration: Upload to content delivery networks for immediate distribution.
// S3 presigned URL for secure upload https://your-bucket.s3.amazonaws.com/generated/content.mp4?X-Amz-Signature=abc123&X-Amz-Expires=3600 // Google Cloud Storage presigned URL https://storage.googleapis.com/your-bucket/content.jpg?X-Goog-Signature=xyz789 // Custom storage endpoint https://storage.example.com/uploads/generated-image.jpgThe content data will be sent as the request body to the specified URL when generation is complete.
-
ttl
integer min: 60 -
Time-to-live (TTL) in seconds for generated content. Only applies when
outputTypeisURL.
-
includeCost
boolean default: false -
Include task cost in the response.
Inputs
Input resources for the task (images, audio, etc). These must be nested inside the inputs object.
inputs object.-
inputs»imageimage
string required -
Image input (UUID, URL, Data URI, or Base64).
Harbor Teahouse After Storm
The image captures a serene waterfront scene where three individuals are engaged in different activities. The central figure, wearing an apron, is meticulously arranging small cups on a table, suggesting he might be preparing tea or coffee. To his right, two men are seated at another table, engrossed in examining a map, possibly planning a journey or discussing directions. The backdrop features a misty harbor filled with docked fishing boats, adding a sense of calm and anticipation to the setting. {
"taskType": "caption",
"taskUUID": "eb705042-61a2-4922-8945-642ebf948d57",
"model": "runware:152@2",
"prompt": "Write a rich caption for this image in 2-3 sentences. Identify the primary setting, the main human activity, notable background details, and the overall mood. Keep the description grounded in visible evidence and avoid speculation.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/ddf33190-1c7c-4b47-9207-d219fdd0491c.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "eb705042-61a2-4922-8945-642ebf948d57",
"text": "The image captures a serene waterfront scene where three individuals are engaged in different activities. The central figure, wearing an apron, is meticulously arranging small cups on a table, suggesting he might be preparing tea or coffee. To his right, two men are seated at another table, engrossed in examining a map, possibly planning a journey or discussing directions. The backdrop features a misty harbor filled with docked fishing boats, adding a sense of calm and anticipation to the setting.",
"cost": 0.0019
}Moonlit Archive Courtyard Tableau
A medieval courtyard scene under a night sky with a crescent moon, where scholars in period attire study ancient manuscripts by candlelight. The central table is cluttered with scrolls, books, and scholarly tools, while a white cat perches on a fountain nearby. Ivy-clad walls and hanging papers add to the historical ambiance. {
"taskType": "caption",
"taskUUID": "3e5eacba-0586-498a-a217-b3eb1135c99f",
"model": "runware:152@2",
"prompt": "Write a concise but vivid caption for this image. Mention the setting, the main people or animals present, and the most notable objects or activities without speculating beyond what is visible.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/cfa6e306-886f-444b-b3bd-1d4d4f1de7eb.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "3e5eacba-0586-498a-a217-b3eb1135c99f",
"text": "A medieval courtyard scene under a night sky with a crescent moon, where scholars in period attire study ancient manuscripts by candlelight. The central table is cluttered with scrolls, books, and scholarly tools, while a white cat perches on a fountain nearby. Ivy-clad walls and hanging papers add to the historical ambiance.",
"cost": 0.0013
}Fossil Hall Field Notes
The image captures a lively educational setting within a museum, featuring a large Tyrannosaurus rex skeleton as the centerpiece. The T-Rex is mounted high above the floor, dominating the space with its imposing presence. Below, a group of students, some holding notebooks, attentively listen to a guide dressed in a yellow cardigan who gestures towards the dinosaur's anatomy. The background showcases glass display cases containing various fossils, including ammonites and other marine life, adding depth to the exhibit. {
"taskType": "caption",
"taskUUID": "74654116-0ef8-4f02-ba4b-ffcbd9e8fb32",
"model": "runware:152@2",
"prompt": "Write a rich, accurate caption for this image. Identify the main subject, notable secondary details, the setting, lighting, and the overall educational atmosphere. Mention any visible human activity and the relationship between the fossil displays and visitors. Keep it concise but specific.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/3a9f575c-904c-417e-a3e1-cf347cc83f10.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "74654116-0ef8-4f02-ba4b-ffcbd9e8fb32",
"text": "The image captures a lively educational setting within a museum, featuring a large Tyrannosaurus rex skeleton as the centerpiece. The T-Rex is mounted high above the floor, dominating the space with its imposing presence. Below, a group of students, some holding notebooks, attentively listen to a guide dressed in a yellow cardigan who gestures towards the dinosaur's anatomy. The background showcases glass display cases containing various fossils, including ammonites and other marine life, adding depth to the exhibit.",
"cost": 0.0019
}Tidal Observatory Reading Room
In this serene and scholarly room, two individuals are engrossed in their work amidst a collection of books, maps, and scientific instruments. A cat lounges comfortably on a stool near a telescope, while seagulls fly outside the large arched windows, suggesting a coastal location. The atmosphere is one of quiet concentration and intellectual pursuit. {
"taskType": "caption",
"taskUUID": "2ce3fe26-9703-4876-99b4-d6d317efb096",
"model": "runware:152@2",
"prompt": "Write a vivid 2-3 sentence caption describing the main setting, notable objects, and human activity in the image. Keep it natural, specific, and observational.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/697d2c97-482e-4c15-af86-1a2df26db98b.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "2ce3fe26-9703-4876-99b4-d6d317efb096",
"text": "In this serene and scholarly room, two individuals are engrossed in their work amidst a collection of books, maps, and scientific instruments. A cat lounges comfortably on a stool near a telescope, while seagulls fly outside the large arched windows, suggesting a coastal location. The atmosphere is one of quiet concentration and intellectual pursuit.",
"cost": 0.0013
}