Qwen2.5-VL-3B-Instruct
Qwen2.5-VL-3B-Instruct is a multimodal model that processes images and text together to perform visual reasoning, captioning, question answering, and structured output tasks. It integrates a vision encoder with an instruction-tuned language backbone to support complex visual understanding and interactive multimodal responses.
API Options
Platform-level options for task execution and delivery.
-
taskType
string required value: caption -
Identifier for the type of task being performed
-
taskUUID
string required UUID v4 -
UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.
-
outputType
string default: URL -
Image output type.
Allowed values 3 values
-
outputFormat
string default: JPG -
Specifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.
- `JPG`: Best for photorealistic images with smaller file sizes (no transparency).
- `PNG`: Lossless compression, supports high quality and transparency (alpha channel).
- `WEBP`: Modern format providing superior compression and transparency support.
**Transparency**: If you are using features like background removal or LayerDiffuse that require transparency, you must select a format that supports an alpha channel (e.g., `PNG`, `WEBP`, `TIFF`). `JPG` does not support transparency.Allowed values 3 values
-
outputQuality
integer min: 20 max: 99 default: 95 -
Compression quality of the output. Higher values preserve quality but increase file size.
-
webhookURL
string URI -
Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.
Learn more 1 resource
- Webhooks PLATFORM
- Webhooks
-
deliveryMethod
string default: sync -
Determines how the API delivers task results.
Allowed values 2 values
- Returns complete results directly in the API response.
- Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.
Learn more 1 resource
- Task Polling PLATFORM
-
uploadEndpoint
string URI -
Specifies a URL where the generated content will be automatically uploaded using the HTTP PUT method. The raw binary data of the media file is sent directly as the request body. For secure uploads to cloud storage, use presigned URLs that include temporary authentication credentials.
Common use cases:
- Cloud storage: Upload directly to S3 buckets, Google Cloud Storage, or Azure Blob Storage using presigned URLs.
- CDN integration: Upload to content delivery networks for immediate distribution.
// S3 presigned URL for secure upload https://your-bucket.s3.amazonaws.com/generated/content.mp4?X-Amz-Signature=abc123&X-Amz-Expires=3600 // Google Cloud Storage presigned URL https://storage.googleapis.com/your-bucket/content.jpg?X-Goog-Signature=xyz789 // Custom storage endpoint https://storage.example.com/uploads/generated-image.jpgThe content data will be sent as the request body to the specified URL when generation is complete.
-
ttl
integer min: 60 -
Time-to-live (TTL) in seconds for generated content. Only applies when
outputTypeisURL.
-
includeCost
boolean default: false -
Include task cost in the response.
Inputs
Input resources for the task (images, audio, etc). These must be nested inside the inputs object.
inputs object.-
inputs»imageimage
string required -
Image input (UUID, URL, Data URI, or Base64).
Taxidermy Workshop Curiosity Cabinet
In a dimly lit, sepia-toned laboratory filled with shelves of jars containing various specimens, two birds stand on a wooden desk. The desk is cluttered with tools, including a magnifying glass, a small knife, and a green and red object that resembles a toy car. A large glass dome encases a collection of feathers and other natural items. To the left, a book lies open, while to the right, a skull and other anatomical models are displayed. {
"taskType": "caption",
"taskUUID": "74c7c126-bc3d-4e55-b8fa-36e20e2a4e62",
"model": "runware:152@1",
"prompt": "Write a precise 5-7 sentence caption describing the scene, highlighting the main subject, notable objects, materials, lighting, and overall atmosphere. Mention spatial relationships clearly and avoid speculation beyond what is visible.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/465571c4-411b-476e-a975-6e8cc06c397d.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "74c7c126-bc3d-4e55-b8fa-36e20e2a4e62",
"text": "In a dimly lit, sepia-toned laboratory filled with shelves of jars containing various specimens, two birds stand on a wooden desk. The desk is cluttered with tools, including a magnifying glass, a small knife, and a green and red object that resembles a toy car. A large glass dome encases a collection of feathers and other natural items. To the left, a book lies open, while to the right, a skull and other anatomical models are displayed.",
"cost": 0.0026
}Harbor Market Eclipse Tableau
In this vibrant seaside town, a dramatic solar eclipse casts a radiant halo over the bustling market, where vendors sell fresh produce and seafood. Children play near a makeshift telescope, while adults tend to their stalls under the enchanting sky. The town's architecture features colorful awnings and signs in Arabic script, adding to the lively atmosphere. A large ship sails past the dock, and seagulls soar above, enhancing the scene's dynamic energy. {
"taskType": "caption",
"taskUUID": "3e62120b-9210-4117-9231-9f6bdbfecc8a",
"model": "runware:152@1",
"prompt": "Write a precise, vivid caption for this image in 2-3 sentences. Identify the main setting, the unusual sky event, notable human activity, and at least five specific visual details without guessing hidden context.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/13dcd4da-6c25-4422-b900-b3abab18ce24.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "3e62120b-9210-4117-9231-9f6bdbfecc8a",
"text": "In this vibrant seaside town, a dramatic solar eclipse casts a radiant halo over the bustling market, where vendors sell fresh produce and seafood. Children play near a makeshift telescope, while adults tend to their stalls under the enchanting sky. The town's architecture features colorful awnings and signs in Arabic script, adding to the lively atmosphere. A large ship sails past the dock, and seagulls soar above, enhancing the scene's dynamic energy.",
"cost": 0.0026
}Salt-Flat Breakfast Balloonists
Two individuals are enjoying a picnic on a vast, snowy plain under a clear sky. They are seated on patterned rugs, surrounded by a colorful hot air balloon festival. The balloons are vibrant, with hues of red, yellow, blue, and green, floating above them. The setting sun casts a warm glow over the scene, creating a serene and picturesque atmosphere. {
"taskType": "caption",
"taskUUID": "e340cc77-2772-4f13-8297-1a7dffcf525c",
"model": "runware:152@1",
"prompt": "Write a vivid, accurate caption for this image in 2-3 sentences. Describe the main subjects, setting, notable objects, colors, lighting, and the overall mood. Avoid speculation beyond what is clearly visible.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/02da3f55-7a4b-4064-a905-ca325e849a70.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "e340cc77-2772-4f13-8297-1a7dffcf525c",
"text": "Two individuals are enjoying a picnic on a vast, snowy plain under a clear sky. They are seated on patterned rugs, surrounded by a colorful hot air balloon festival. The balloons are vibrant, with hues of red, yellow, blue, and green, floating above them. The setting sun casts a warm glow over the scene, creating a serene and picturesque atmosphere.",
"cost": 0.0019
}Rooftop Pigeon Racing Finale
A lively rooftop gathering features two men flying colorful kites amidst a flock of pigeons, with a backdrop of traditional buildings and a minaret silhouetted against the setting sun. {
"taskType": "caption",
"taskUUID": "66a9623a-dbd9-4db9-8975-592584b3d844",
"model": "runware:152@1",
"prompt": "Write a concise, vivid caption in 1-2 sentences describing the main action, setting, and mood of this image.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/7590c546-9819-4c3b-b3a7-3b16838e1222.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "66a9623a-dbd9-4db9-8975-592584b3d844",
"text": "A lively rooftop gathering features two men flying colorful kites amidst a flock of pigeons, with a backdrop of traditional buildings and a minaret silhouetted against the setting sun.",
"cost": 0.0013
}