MODEL ID alibaba-qwen2-5-vl-3b-instruct
live

Qwen2.5-VL-3B-Instruct

Alibaba
by Alibaba

Qwen2.5-VL-3B-Instruct is a multimodal model that processes images and text together to perform visual reasoning, captioning, question answering, and structured output tasks. It integrates a vision encoder with an instruction-tuned language backbone to support complex visual understanding and interactive multimodal responses.

Qwen2.5-VL-3B-Instruct
Caption

Taxidermy Workshop Curiosity Cabinet

In a dimly lit, sepia-toned laboratory filled with shelves of jars containing various specimens, two birds stand on a wooden desk. The desk is cluttered with tools, including a magnifying glass, a small knife, and a green and red object that resembles a toy car. A large glass dome encases a collection of feathers and other natural items. To the left, a book lies open, while to the right, a skull and other anatomical models are displayed.
{
  "taskType": "caption",
  "taskUUID": "74c7c126-bc3d-4e55-b8fa-36e20e2a4e62",
  "model": "runware:152@1",
  "prompt": "Write a precise 5-7 sentence caption describing the scene, highlighting the main subject, notable objects, materials, lighting, and overall atmosphere. Mention spatial relationships clearly and avoid speculation beyond what is visible.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/465571c4-411b-476e-a975-6e8cc06c397d.jpg"
  }
}
{
  "taskType": "imageCaption",
  "taskUUID": "74c7c126-bc3d-4e55-b8fa-36e20e2a4e62",
  "text": "In a dimly lit, sepia-toned laboratory filled with shelves of jars containing various specimens, two birds stand on a wooden desk. The desk is cluttered with tools, including a magnifying glass, a small knife, and a green and red object that resembles a toy car. A large glass dome encases a collection of feathers and other natural items. To the left, a book lies open, while to the right, a skull and other anatomical models are displayed.",
  "cost": 0.0026
}
Caption

Harbor Market Eclipse Tableau

In this vibrant seaside town, a dramatic solar eclipse casts a radiant halo over the bustling market, where vendors sell fresh produce and seafood. Children play near a makeshift telescope, while adults tend to their stalls under the enchanting sky. The town's architecture features colorful awnings and signs in Arabic script, adding to the lively atmosphere. A large ship sails past the dock, and seagulls soar above, enhancing the scene's dynamic energy.
{
  "taskType": "caption",
  "taskUUID": "3e62120b-9210-4117-9231-9f6bdbfecc8a",
  "model": "runware:152@1",
  "prompt": "Write a precise, vivid caption for this image in 2-3 sentences. Identify the main setting, the unusual sky event, notable human activity, and at least five specific visual details without guessing hidden context.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/13dcd4da-6c25-4422-b900-b3abab18ce24.jpg"
  }
}
{
  "taskType": "imageCaption",
  "taskUUID": "3e62120b-9210-4117-9231-9f6bdbfecc8a",
  "text": "In this vibrant seaside town, a dramatic solar eclipse casts a radiant halo over the bustling market, where vendors sell fresh produce and seafood. Children play near a makeshift telescope, while adults tend to their stalls under the enchanting sky. The town's architecture features colorful awnings and signs in Arabic script, adding to the lively atmosphere. A large ship sails past the dock, and seagulls soar above, enhancing the scene's dynamic energy.",
  "cost": 0.0026
}
Caption

Salt-Flat Breakfast Balloonists

Two individuals are enjoying a picnic on a vast, snowy plain under a clear sky. They are seated on patterned rugs, surrounded by a colorful hot air balloon festival. The balloons are vibrant, with hues of red, yellow, blue, and green, floating above them. The setting sun casts a warm glow over the scene, creating a serene and picturesque atmosphere.
{
  "taskType": "caption",
  "taskUUID": "e340cc77-2772-4f13-8297-1a7dffcf525c",
  "model": "runware:152@1",
  "prompt": "Write a vivid, accurate caption for this image in 2-3 sentences. Describe the main subjects, setting, notable objects, colors, lighting, and the overall mood. Avoid speculation beyond what is clearly visible.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/02da3f55-7a4b-4064-a905-ca325e849a70.jpg"
  }
}
{
  "taskType": "imageCaption",
  "taskUUID": "e340cc77-2772-4f13-8297-1a7dffcf525c",
  "text": "Two individuals are enjoying a picnic on a vast, snowy plain under a clear sky. They are seated on patterned rugs, surrounded by a colorful hot air balloon festival. The balloons are vibrant, with hues of red, yellow, blue, and green, floating above them. The setting sun casts a warm glow over the scene, creating a serene and picturesque atmosphere.",
  "cost": 0.0019
}
Caption

Rooftop Pigeon Racing Finale

A lively rooftop gathering features two men flying colorful kites amidst a flock of pigeons, with a backdrop of traditional buildings and a minaret silhouetted against the setting sun.
{
  "taskType": "caption",
  "taskUUID": "66a9623a-dbd9-4db9-8975-592584b3d844",
  "model": "runware:152@1",
  "prompt": "Write a concise, vivid caption in 1-2 sentences describing the main action, setting, and mood of this image.",
  "inputs": {
    "image": "https://assets.runware.ai/assets/inputs/7590c546-9819-4c3b-b3a7-3b16838e1222.jpg"
  }
}
{
  "taskType": "imageCaption",
  "taskUUID": "66a9623a-dbd9-4db9-8975-592584b3d844",
  "text": "A lively rooftop gathering features two men flying colorful kites amidst a flock of pigeons, with a backdrop of traditional buildings and a minaret silhouetted against the setting sun.",
  "cost": 0.0013
}