Qwen2.5-VL-7B-Instruct
Qwen2.5-VL-7B-Instruct is a multimodal model that processes images and text together to perform visual reasoning, captioning, question answering, and structured output generation. It integrates a vision encoder with a 7B instruction-tuned language backbone to support rich interactive multimodal understanding.
API Reference
INTEGRATE
Complete technical specification for integration
Request Response
Examples 4
CODE
Ready-to-use code snippets for common workflows
Harbor Teahouse After Storm
The image captures a serene waterfront scene where three individuals are engaged in different activities. The central figure, wearing an apron, is meticulously arranging small cups on a table, suggesting he might be preparing tea or coffee. To his right, two men are seated at another table, engrossed in examining a map, possibly planning a journey or discussing directions. The backdrop features a misty harbor filled with docked fishing boats, adding a sense of calm and anticipation to the setting. {
"taskType": "caption",
"taskUUID": "eb705042-61a2-4922-8945-642ebf948d57",
"model": "runware:152@2",
"prompt": "Write a rich caption for this image in 2-3 sentences. Identify the primary setting, the main human activity, notable background details, and the overall mood. Keep the description grounded in visible evidence and avoid speculation.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/ddf33190-1c7c-4b47-9207-d219fdd0491c.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "eb705042-61a2-4922-8945-642ebf948d57",
"text": "The image captures a serene waterfront scene where three individuals are engaged in different activities. The central figure, wearing an apron, is meticulously arranging small cups on a table, suggesting he might be preparing tea or coffee. To his right, two men are seated at another table, engrossed in examining a map, possibly planning a journey or discussing directions. The backdrop features a misty harbor filled with docked fishing boats, adding a sense of calm and anticipation to the setting.",
"cost": 0.0019
}Moonlit Archive Courtyard Tableau
A medieval courtyard scene under a night sky with a crescent moon, where scholars in period attire study ancient manuscripts by candlelight. The central table is cluttered with scrolls, books, and scholarly tools, while a white cat perches on a fountain nearby. Ivy-clad walls and hanging papers add to the historical ambiance. {
"taskType": "caption",
"taskUUID": "3e5eacba-0586-498a-a217-b3eb1135c99f",
"model": "runware:152@2",
"prompt": "Write a concise but vivid caption for this image. Mention the setting, the main people or animals present, and the most notable objects or activities without speculating beyond what is visible.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/cfa6e306-886f-444b-b3bd-1d4d4f1de7eb.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "3e5eacba-0586-498a-a217-b3eb1135c99f",
"text": "A medieval courtyard scene under a night sky with a crescent moon, where scholars in period attire study ancient manuscripts by candlelight. The central table is cluttered with scrolls, books, and scholarly tools, while a white cat perches on a fountain nearby. Ivy-clad walls and hanging papers add to the historical ambiance.",
"cost": 0.0013
}Fossil Hall Field Notes
The image captures a lively educational setting within a museum, featuring a large Tyrannosaurus rex skeleton as the centerpiece. The T-Rex is mounted high above the floor, dominating the space with its imposing presence. Below, a group of students, some holding notebooks, attentively listen to a guide dressed in a yellow cardigan who gestures towards the dinosaur's anatomy. The background showcases glass display cases containing various fossils, including ammonites and other marine life, adding depth to the exhibit. {
"taskType": "caption",
"taskUUID": "74654116-0ef8-4f02-ba4b-ffcbd9e8fb32",
"model": "runware:152@2",
"prompt": "Write a rich, accurate caption for this image. Identify the main subject, notable secondary details, the setting, lighting, and the overall educational atmosphere. Mention any visible human activity and the relationship between the fossil displays and visitors. Keep it concise but specific.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/3a9f575c-904c-417e-a3e1-cf347cc83f10.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "74654116-0ef8-4f02-ba4b-ffcbd9e8fb32",
"text": "The image captures a lively educational setting within a museum, featuring a large Tyrannosaurus rex skeleton as the centerpiece. The T-Rex is mounted high above the floor, dominating the space with its imposing presence. Below, a group of students, some holding notebooks, attentively listen to a guide dressed in a yellow cardigan who gestures towards the dinosaur's anatomy. The background showcases glass display cases containing various fossils, including ammonites and other marine life, adding depth to the exhibit.",
"cost": 0.0019
}Tidal Observatory Reading Room
In this serene and scholarly room, two individuals are engrossed in their work amidst a collection of books, maps, and scientific instruments. A cat lounges comfortably on a stool near a telescope, while seagulls fly outside the large arched windows, suggesting a coastal location. The atmosphere is one of quiet concentration and intellectual pursuit. {
"taskType": "caption",
"taskUUID": "2ce3fe26-9703-4876-99b4-d6d317efb096",
"model": "runware:152@2",
"prompt": "Write a vivid 2-3 sentence caption describing the main setting, notable objects, and human activity in the image. Keep it natural, specific, and observational.",
"inputs": {
"image": "https://assets.runware.ai/assets/inputs/697d2c97-482e-4c15-af86-1a2df26db98b.jpg"
}
}{
"taskType": "imageCaption",
"taskUUID": "2ce3fe26-9703-4876-99b4-d6d317efb096",
"text": "In this serene and scholarly room, two individuals are engrossed in their work amidst a collection of books, maps, and scientific instruments. A cat lounges comfortably on a stool near a telescope, while seagulls fly outside the large arched windows, suggesting a coastal location. The atmosphere is one of quiet concentration and intellectual pursuit.",
"cost": 0.0013
}