---
title: Qwen2.5-VL-7B-Instruct | Runware Docs
url: https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct
description: Instruction-tuned multimodal vision-language model
---
# Qwen2.5-VL-7B-Instruct

Qwen2.5-VL-7B-Instruct is a multimodal model that processes images and text together to perform visual reasoning, captioning, question answering, and structured output generation. It integrates a vision encoder with a 7B instruction-tuned language backbone to support rich interactive multimodal understanding.

- **ID**: `runware:152@2`
- **Status**: live
- **Creator**: Alibaba
- **Release Date**: August 24, 2023
- **Capabilities**: Image to Text, Caption

## Pricing

- **95 - 105 tokens**: `$0.0019`

## Request Parameters

**API Options**

Platform-level options for task execution and delivery.

### [taskType](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-tasktype)

- **Type**: `string`
- **Required**: true
- **Value**: `caption`

Identifier for the type of task being performed

### [taskUUID](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-taskuuid)

- **Type**: `string`
- **Required**: true
- **Format**: `UUID v4`

UUID v4 identifier for tracking tasks and matching async responses. Must be unique per task.

### [outputType](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-outputtype)

- **Type**: `string`
- **Default**: `URL`

Image output type.

**Allowed values**: `URL` `base64Data` `dataURI`

### [outputFormat](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-outputformat)

- **Type**: `string`
- **Default**: `JPG`

Specifies the file format of the generated output. The available values depend on the task type and the specific model's capabilities.

- \`JPG\`: Best for photorealistic images with smaller file sizes (no transparency).
- \`PNG\`: Lossless compression, supports high quality and transparency (alpha channel).
- \`WEBP\`: Modern format providing superior compression and transparency support.

> [!NOTE]
> \*\*Transparency\*\*: If you are using features like background removal or LayerDiffuse that require transparency, you must select a format that supports an alpha channel (e.g., \`PNG\`, \`WEBP\`, \`TIFF\`). \`JPG\` does not support transparency.

**Allowed values**: `JPG` `PNG` `WEBP`

### [outputQuality](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-outputquality)

- **Type**: `integer`
- **Min**: `20`
- **Max**: `99`
- **Default**: `95`

Compression quality of the output. Higher values preserve quality but increase file size.

### [webhookURL](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-webhookurl)

- **Type**: `string`
- **Format**: `URI`

Specifies a webhook URL where JSON responses will be sent via HTTP POST when generation tasks complete. For batch requests with multiple results, each completed item triggers a separate webhook call as it becomes available.

**Learn more** (1 resource):

- [Webhooks](https://runware.ai/docs/platform/webhooks) (platform)

### [deliveryMethod](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-deliverymethod)

- **Type**: `string`
- **Default**: `sync`

Determines how the API delivers task results.

**Allowed values**:

- `sync` Returns complete results directly in the API response.
- `async` Returns an immediate acknowledgment with the task UUID. Poll for results using getResponse.

**Learn more** (1 resource):

- [Task Polling](https://runware.ai/docs/platform/task-polling) (platform)

### [uploadEndpoint](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-uploadendpoint)

- **Type**: `string`
- **Format**: `URI`

Specifies a URL where the generated content will be automatically uploaded using the HTTP PUT method. The raw binary data of the media file is sent directly as the request body. For secure uploads to cloud storage, use presigned URLs that include temporary authentication credentials.

**Common use cases:**

- **Cloud storage**: Upload directly to S3 buckets, Google Cloud Storage, or Azure Blob Storage using presigned URLs.
- **CDN integration**: Upload to content delivery networks for immediate distribution.

```text
// S3 presigned URL for secure upload
https://your-bucket.s3.amazonaws.com/generated/content.mp4?X-Amz-Signature=abc123&X-Amz-Expires=3600

// Google Cloud Storage presigned URL
https://storage.googleapis.com/your-bucket/content.jpg?X-Goog-Signature=xyz789

// Custom storage endpoint
https://storage.example.com/uploads/generated-image.jpg
```

The content data will be sent as the request body to the specified URL when generation is complete.

### [ttl](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-ttl)

- **Type**: `integer`
- **Min**: `60`

Time-to-live (TTL) in seconds for generated content. Only applies when `outputType` is `URL`.

### [includeCost](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-includecost)

- **Type**: `boolean`
- **Default**: `false`

Include task cost in the response.

**Inputs**

Input resources for the task (images, audio, etc). These must be nested inside the \`inputs\` object.

### [image](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-inputs-image)

- **Path**: `inputs.image`
- **Type**: `string`
- **Required**: true

Image input (UUID, URL, Data URI, or Base64).

**Generation Parameters**

Core parameters for controlling the generated content.

### [model](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-model)

- **Type**: `string`
- **Required**: true
- **Value**: `runware:152@2`

Identifier of the model to use for generation.

**Learn more** (3 resources):

- [Text To Image: Model Selection The Foundation Of Generation](https://runware.ai/docs/guides/text-to-image#model-selection-the-foundation-of-generation) (guide)
- [Image Inpainting: Model Specialized Inpainting Models](https://runware.ai/docs/guides/image-inpainting#model-specialized-inpainting-models) (guide)
- [Image Outpainting: Other Critical Parameters](https://runware.ai/docs/guides/image-outpainting#other-critical-parameters) (guide)

### [prompt](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#request-prompt)

- **Type**: `string`

Instructions or questions to guide the image analysis.

## Response Parameters

### [taskType](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#response-tasktype)

- **Type**: `string`
- **Required**: true
- **Value**: `caption`

Type of the task.

### [taskUUID](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#response-taskuuid)

- **Type**: `string`
- **Required**: true
- **Format**: `UUID v4`

UUID of the task.

### [text](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#response-text)

- **Type**: `string`
- **Required**: true

Generated caption text describing the input.

### [cost](https://runware.ai/docs/models/alibaba-qwen2-5-vl-7b-instruct#response-cost)

- **Type**: `float`

Task cost in USD. Present when `includeCost` is set to `true` in the request.

## Examples

### Harbor Teahouse After Storm (Caption)

**Request**:

```json
{
  "taskType": "caption",
  "taskUUID": "eb705042-61a2-4922-8945-642ebf948d57",
  "model": "runware:152@2",
  "prompt": "Write a rich caption for this image in 2-3 sentences. Identify the primary setting, the main human activity, notable background details, and the overall mood. Keep the description grounded in visible evidence and avoid speculation.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/ddf33190-1c7c-4b47-9207-d219fdd0491c.jpg"
  }
}
```

**Response**:

```json
{
  "taskType": "imageCaption",
  "taskUUID": "eb705042-61a2-4922-8945-642ebf948d57",
  "text": "The image captures a serene waterfront scene where three individuals are engaged in different activities. The central figure, wearing an apron, is meticulously arranging small cups on a table, suggesting he might be preparing tea or coffee. To his right, two men are seated at another table, engrossed in examining a map, possibly planning a journey or discussing directions. The backdrop features a misty harbor filled with docked fishing boats, adding a sense of calm and anticipation to the setting.",
  "cost": 0.0019
}
```

---

### Moonlit Archive Courtyard Tableau (Caption)

**Request**:

```json
{
  "taskType": "caption",
  "taskUUID": "3e5eacba-0586-498a-a217-b3eb1135c99f",
  "model": "runware:152@2",
  "prompt": "Write a concise but vivid caption for this image. Mention the setting, the main people or animals present, and the most notable objects or activities without speculating beyond what is visible.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/cfa6e306-886f-444b-b3bd-1d4d4f1de7eb.jpg"
  }
}
```

**Response**:

```json
{
  "taskType": "imageCaption",
  "taskUUID": "3e5eacba-0586-498a-a217-b3eb1135c99f",
  "text": "A medieval courtyard scene under a night sky with a crescent moon, where scholars in period attire study ancient manuscripts by candlelight. The central table is cluttered with scrolls, books, and scholarly tools, while a white cat perches on a fountain nearby. Ivy-clad walls and hanging papers add to the historical ambiance.",
  "cost": 0.0013
}
```

---

### Fossil Hall Field Notes (Caption)

**Request**:

```json
{
  "taskType": "caption",
  "taskUUID": "74654116-0ef8-4f02-ba4b-ffcbd9e8fb32",
  "model": "runware:152@2",
  "prompt": "Write a rich, accurate caption for this image. Identify the main subject, notable secondary details, the setting, lighting, and the overall educational atmosphere. Mention any visible human activity and the relationship between the fossil displays and visitors. Keep it concise but specific.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/3a9f575c-904c-417e-a3e1-cf347cc83f10.jpg"
  }
}
```

**Response**:

```json
{
  "taskType": "imageCaption",
  "taskUUID": "74654116-0ef8-4f02-ba4b-ffcbd9e8fb32",
  "text": "The image captures a lively educational setting within a museum, featuring a large Tyrannosaurus rex skeleton as the centerpiece. The T-Rex is mounted high above the floor, dominating the space with its imposing presence. Below, a group of students, some holding notebooks, attentively listen to a guide dressed in a yellow cardigan who gestures towards the dinosaur's anatomy. The background showcases glass display cases containing various fossils, including ammonites and other marine life, adding depth to the exhibit.",
  "cost": 0.0019
}
```

---

### Tidal Observatory Reading Room (Caption)

**Request**:

```json
{
  "taskType": "caption",
  "taskUUID": "2ce3fe26-9703-4876-99b4-d6d317efb096",
  "model": "runware:152@2",
  "prompt": "Write a vivid 2-3 sentence caption describing the main setting, notable objects, and human activity in the image. Keep it natural, specific, and observational.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/697d2c97-482e-4c15-af86-1a2df26db98b.jpg"
  }
}
```

**Response**:

```json
{
  "taskType": "imageCaption",
  "taskUUID": "2ce3fe26-9703-4876-99b4-d6d317efb096",
  "text": "In this serene and scholarly room, two individuals are engrossed in their work amidst a collection of books, maps, and scientific instruments. A cat lounges comfortably on a stool near a telescope, while seagulls fly outside the large arched windows, suggesting a coastal location. The atmosphere is one of quiet concentration and intellectual pursuit.",
  "cost": 0.0013
}
```