---
title: Qwen2.5-VL-3B-Instruct | Runware Docs
url: https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct
description: Instruction-tuned vision-language model for image and text understanding
---
# Qwen2.5-VL-3B-Instruct

Qwen2.5-VL-3B-Instruct is a multimodal model that processes images and text together to perform visual reasoning, captioning, question answering, and structured output tasks. It integrates a vision encoder with an instruction-tuned language backbone to support complex visual understanding and interactive multimodal responses.

- **ID**: `runware:152@1`
- **Status**: live
- **Creator**: Alibaba
- **Release Date**: August 24, 2023
- **Capabilities**: Image to Text, Caption

## Pricing

- **90 - 118 tokens**: `$0.0026`

## Request Parameters

**API Options**

Platform-level options for task execution and delivery.

### [taskType](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-tasktype)

- **Type**: `string`
- **Required**: true
- **Value**: `caption`

Identifier for the type of task being performed

### [taskUUID](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-taskuuid)

- **Type**: `string`
- **Required**: true
- **Format**: `UUID v4`

UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.

### [outputType](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-outputtype)

- **Type**: `string`
- **Default**: `URL`

Image output type.

**Allowed values**: `URL` `base64Data` `dataURI`

### [outputFormat](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-outputformat)

- **Type**: `string`
- **Default**: `JPG`

Specifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.

- \`JPG\`: Best for photorealistic images with smaller file sizes (no transparency).
- \`PNG\`: Lossless compression, supports high quality and transparency (alpha channel).
- \`WEBP\`: Modern format providing superior compression and transparency support.

> [!NOTE]
> \*\*Transparency\*\*: If you are using features like background removal or LayerDiffuse that require transparency, you must select a format that supports an alpha channel (e.g., \`PNG\`, \`WEBP\`, \`TIFF\`). \`JPG\` does not support transparency.

**Allowed values**: `JPG` `PNG` `WEBP`

### [outputQuality](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-outputquality)

- **Type**: `integer`
- **Min**: `20`
- **Max**: `99`
- **Default**: `95`

Compression quality of the output. Higher values preserve quality but increase file size.

### [webhookURL](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-webhookurl)

- **Type**: `string`
- **Format**: `URI`

Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.

**Learn more** (1 resource):

- [Webhooks](https://runware.ai/docs/platform/webhooks) (platform)

### [deliveryMethod](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-deliverymethod)

- **Type**: `string`
- **Default**: `sync`

Determines how the API delivers task results.

**Allowed values**:

- `sync` Returns complete results directly in the API response.
- `async` Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.

**Learn more** (1 resource):

- [Task Polling](https://runware.ai/docs/platform/task-polling) (platform)

### [uploadEndpoint](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-uploadendpoint)

- **Type**: `string`
- **Format**: `URI`

Specifies a URL where the generated content will be automatically uploaded using the HTTP PUT method. The raw binary data of the media file is sent directly as the request body. For secure uploads to cloud storage, use presigned URLs that include temporary authentication credentials.

**Common use cases:**

- **Cloud storage**: Upload directly to S3 buckets, Google Cloud Storage, or Azure Blob Storage using presigned URLs.
- **CDN integration**: Upload to content delivery networks for immediate distribution.

```text
// S3 presigned URL for secure upload
https://your-bucket.s3.amazonaws.com/generated/content.mp4?X-Amz-Signature=abc123&X-Amz-Expires=3600

// Google Cloud Storage presigned URL
https://storage.googleapis.com/your-bucket/content.jpg?X-Goog-Signature=xyz789

// Custom storage endpoint
https://storage.example.com/uploads/generated-image.jpg
```

The content data will be sent as the request body to the specified URL when generation is complete.

### [ttl](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-ttl)

- **Type**: `integer`
- **Min**: `60`

Time-to-live (TTL) in seconds for generated content. Only applies when `outputType` is `URL`.

### [includeCost](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-includecost)

- **Type**: `boolean`
- **Default**: `false`

Include task cost in the response.

**Inputs**

Input resources for the task (images, audio, etc). These must be nested inside the \`inputs\` object.

### [image](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-inputs-image)

- **Path**: `inputs.image`
- **Type**: `string`
- **Required**: true

Image input (UUID, URL, Data URI, or Base64).

**Generation Parameters**

Core parameters for controlling the generated content.

### [model](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-model)

- **Type**: `string`
- **Required**: true
- **Value**: `runware:152@1`

Identifier of the model to use for generation.

**Learn more** (3 resources):

- [Text To Image: Model Selection The Foundation Of Generation](https://runware.ai/docs/guides/text-to-image#model-selection-the-foundation-of-generation) (guide)
- [Image Inpainting: Model Specialized Inpainting Models](https://runware.ai/docs/guides/image-inpainting#model-specialized-inpainting-models) (guide)
- [Image Outpainting: Other Critical Parameters](https://runware.ai/docs/guides/image-outpainting#other-critical-parameters) (guide)

### [prompt](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#request-prompt)

- **Type**: `string`

Instructions or questions to guide the image analysis.

## Response Parameters

### [taskType](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#response-tasktype)

- **Type**: `string`
- **Required**: true
- **Value**: `caption`

Type of the task.

### [taskUUID](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#response-taskuuid)

- **Type**: `string`
- **Required**: true
- **Format**: `UUID v4`

UUID of the task.

### [text](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#response-text)

- **Type**: `string`
- **Required**: true

Generated caption text describing the input.

### [cost](https://runware.ai/docs/models/alibaba-qwen2-5-vl-3b-instruct#response-cost)

- **Type**: `float`

Task cost in USD. Present when `includeCost` is set to `true` in the request.

## Examples

### Taxidermy Workshop Curiosity Cabinet (Caption)

**Request**:

```json
{
  "taskType": "caption",
  "taskUUID": "74c7c126-bc3d-4e55-b8fa-36e20e2a4e62",
  "model": "runware:152@1",
  "prompt": "Write a precise 5-7 sentence caption describing the scene, highlighting the main subject, notable objects, materials, lighting, and overall atmosphere. Mention spatial relationships clearly and avoid speculation beyond what is visible.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/465571c4-411b-476e-a975-6e8cc06c397d.jpg"
  }
}
```

**Response**:

```json
{
  "taskType": "imageCaption",
  "taskUUID": "74c7c126-bc3d-4e55-b8fa-36e20e2a4e62",
  "text": "In a dimly lit, sepia-toned laboratory filled with shelves of jars containing various specimens, two birds stand on a wooden desk. The desk is cluttered with tools, including a magnifying glass, a small knife, and a green and red object that resembles a toy car. A large glass dome encases a collection of feathers and other natural items. To the left, a book lies open, while to the right, a skull and other anatomical models are displayed.",
  "cost": 0.0026
}
```

---

### Harbor Market Eclipse Tableau (Caption)

**Request**:

```json
{
  "taskType": "caption",
  "taskUUID": "3e62120b-9210-4117-9231-9f6bdbfecc8a",
  "model": "runware:152@1",
  "prompt": "Write a precise, vivid caption for this image in 2-3 sentences. Identify the main setting, the unusual sky event, notable human activity, and at least five specific visual details without guessing hidden context.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/13dcd4da-6c25-4422-b900-b3abab18ce24.jpg"
  }
}
```

**Response**:

```json
{
  "taskType": "imageCaption",
  "taskUUID": "3e62120b-9210-4117-9231-9f6bdbfecc8a",
  "text": "In this vibrant seaside town, a dramatic solar eclipse casts a radiant halo over the bustling market, where vendors sell fresh produce and seafood. Children play near a makeshift telescope, while adults tend to their stalls under the enchanting sky. The town's architecture features colorful awnings and signs in Arabic script, adding to the lively atmosphere. A large ship sails past the dock, and seagulls soar above, enhancing the scene's dynamic energy.",
  "cost": 0.0026
}
```

---

### Salt-Flat Breakfast Balloonists (Caption)

**Request**:

```json
{
  "taskType": "caption",
  "taskUUID": "e340cc77-2772-4f13-8297-1a7dffcf525c",
  "model": "runware:152@1",
  "prompt": "Write a vivid, accurate caption for this image in 2-3 sentences. Describe the main subjects, setting, notable objects, colors, lighting, and the overall mood. Avoid speculation beyond what is clearly visible.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/02da3f55-7a4b-4064-a905-ca325e849a70.jpg"
  }
}
```

**Response**:

```json
{
  "taskType": "imageCaption",
  "taskUUID": "e340cc77-2772-4f13-8297-1a7dffcf525c",
  "text": "Two individuals are enjoying a picnic on a vast, snowy plain under a clear sky. They are seated on patterned rugs, surrounded by a colorful hot air balloon festival. The balloons are vibrant, with hues of red, yellow, blue, and green, floating above them. The setting sun casts a warm glow over the scene, creating a serene and picturesque atmosphere.",
  "cost": 0.0019
}
```

---

### Rooftop Pigeon Racing Finale (Caption)

**Request**:

```json
{
  "taskType": "caption",
  "taskUUID": "66a9623a-dbd9-4db9-8975-592584b3d844",
  "model": "runware:152@1",
  "prompt": "Write a concise, vivid caption in 1-2 sentences describing the main action, setting, and mood of this image.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/7590c546-9819-4c3b-b3a7-3b16838e1222.jpg"
  }
}
```

**Response**:

```json
{
  "taskType": "imageCaption",
  "taskUUID": "66a9623a-dbd9-4db9-8975-592584b3d844",
  "text": "A lively rooftop gathering features two men flying colorful kites amidst a flock of pigeons, with a backdrop of traditional buildings and a minaret silhouetted against the setting sun.",
  "cost": 0.0013
}
```