Image to Text

Generate image descriptions using Runware's API. Explore how our AI technology analyzes images to produce accurate and concise captions.

Introduction

Image to text, also known as image captioning, allows you to obtain descriptive text prompts based on uploaded or previously generated images. This process is instrumental in generating textual descriptions that can be used to create additional images or provide detailed insights into visual content.

Request

Our API always accepts an array of objects as input, where each object represents a specific task to be performed. The structure of the object varies depending on the type of the task. For this section, we will focus on the parameters related to the image to text task.

The following JSON snippet shows the basic structure of a request object. All properties are explained in detail in the next section.

[
  {
    "taskType": "imageCaption",
    "taskUUID": "f0a5574f-d653-47f1-ab42-e2c1631f1a47",
    "inputImage": "5788104a-1ca7-4b7e-8a16-b27b57e86f87"
  }
]

taskType

string required

The type of task to be performed. For this task, the value should be imageCaption.

taskUUID

string required UUID v4

When a task is sent to the API you must include a random UUID v4 string using the taskUUID parameter. This string is used to match the async responses to their corresponding tasks.

If you send multiple tasks at the same time, the taskUUID will help you match the responses to the correct tasks.

The taskUUID must be unique for each task you send to the API.

includeCost

boolean Default: false

If set to true, the cost to perform the task will be included in the response object.

inputImage

string required

Specifies the input image to be processed. The image can be specified in one of the following formats:

  • An UUID v4 string of a previously uploaded image or a generated image.
  • A data URI string representing the image. The data URI must be in the format data:<mediaType>;base64, followed by the base64-encoded image. For example: ....
  • A base64 encoded image without the data URI prefix. For example: iVBORw0KGgo....
  • A URL pointing to the image. The image must be accessible publicly.

Supported formats are: PNG, JPG and WEBP.

Response

Results will be delivered in the format below.

{
  "data": [
    {
      "taskType": "imageCaption",
      "taskUUID": "f0a5574f-d653-47f1-ab42-e2c1631f1a47",
      "text": "arafed troll in the jungle with a backpack and a stick, cgi animation, cinematic movie image, gremlin, pixie character, nvidia promotional image, park background, with lots of scumbling, hollywood promotional image, on island, chesley, green fog, post-nuclear",
      "cost": 0
    }
  ]
}

taskType

string

The type of task to be performed. For this task, the value should be imageCaption.

taskUUID

string UUID v4

The API will return the taskUUID you sent in the request. This way you can match the responses to the correct request tasks.

text

string

The resulting text or prompt from interrogating the image.

cost

float

if includeCost is set to true, the response will include a cost field for each task object. This field indicates the cost of the request in USD.