Image Inference API

Generate images from text prompts or transform existing ones using Runware's API. Learn how to do image inference for creative and high-quality results.

Introduction

Image inference is a powerful feature that allows you to generate images from text prompts or transform existing images according to your needs. This process is essential for creating high-quality visuals, whether you're looking to bring creative ideas to life or enhance existing images with new styles or subjects.

There are several types of image inference requests you can make using our API:

  • Text-to-Image: Generate images from descriptive text prompts. This process translates your text into high-quality visuals, allowing you to create detailed and vivid images based on your ideas.
  • Image-to-Image: Perform transformations on existing images, whether they are previously generated images or uploaded images. This process enables you to enhance, modify, or stylize images to create new and visually appealing content. With a single parameter you can control the strength of the transformation.
  • Inpainting: Replace parts of an image with new content, allowing you to remove unwanted elements or improve the overall composition of an image. It's like Image-to-Image but with a mask that defines the area to be transformed.
  • Outpainting: Extend the boundaries of an image by generating new content outside the original frame that seamlessly blends with the existing image. As Inpainting, it uses a mask to define the new area to be generated.

Our API also supports advanced features that allow developers to fine-tune the image generation process with precision:

  • ControlNet: A feature that enables precise control over image generation by using additional input conditions, such as edge maps, poses, or segmentation masks. This allows for more accurate alignment with specific user requirements or styles.
  • LoRA: A technique that helps in adapting models to specific styles or tasks by focusing on particular aspects of the data, enhancing the quality and relevance of the generated images.

Additionally, you can tweak numerous parameters to customize the output, such as adjusting the image dimension, steps, scheduler to use, and other generation settings, providing a high level of flexibility to suit your application's needs.

Our API is really fast because we have unique optimizations, custom-designed hardware, and many other elements that are part of our Sonic Inference Engine.

Request

Our API always accepts an array of objects as input, where each object represents a specific task to be performed. The structure of the object varies depending on the type of the task. For this section, we will focus on the parameters related to image inference tasks.

The following JSON snippets shows the basic structure of a request object. All properties are explained in detail in the next section.

[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "outputType": "string",
    "outputFormat": "string",
    "positivePrompt": "string",
    "negativePrompt": "string",
    "height": int,
    "width": int,
    "model": "string",
    "steps": int,
    "CFGScale": float,
    "numberResults": int
  }
]
[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "seedImage": "string",
    "model": "string",
    "height": int,
    "width": int,
    "strength": float,
    "numberResults": int
  }
]
[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "seedImage": "string",
    "maskImage": "string",
    "model": "string",
    "height": int,
    "width": int,
    "strength": float,
    "numberResults": int
  }
]
[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "model": "string",
    "refiner": { 
      "model": "string",
      "startStep": int 
    },
    "height": int,
    "width": int,
    "numberResults": int
  }
]
[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "model": "string",
    "height": int,
    "width": int,
    "numberResults": int,
    "embeddings": [ 
      { 
        "model": "string",
        "weight": float 
      },
      { 
        "model": "string",
        "weight": float 
      } 
    ] 
  }
]
[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "model": "string",
    "height": int,
    "width": int,
    "numberResults": int,
    "controlNet": [ 
      { 
        "model": "string",
        "guideImage": "string",
        "weight": float,
        "startStep": int,
        "endStep": int,
        "controlMode": "string"
      },
      { 
        "model": "string",
        "guideImage": "string",
        "weight": float,
        "startStep": int,
        "endStep": int,
        "controlMode": "string"
      } 
    ] 
  }
]
[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "model": "string",
    "height": int,
    "width": int,
    "numberResults": int,
    "lora": [ 
      { 
        "model": "string",
        "weight": float 
      },
      { 
        "model": "string",
        "weight": float 
      } 
    ] 
  }
]

You can mix multiple ControlNet and LoRA objects in the same request to achieve more complex control over the generation process.

taskType

string required

The type of task to be performed. For this task, the value should be imageInference.

taskUUID

string required UUID v4

When a task is sent to the API you must include a random UUID v4 string using the taskUUID parameter. This string is used to match the async responses to their corresponding tasks.

If you send multiple tasks at the same time, the taskUUID will help you match the responses to the correct tasks.

The taskUUID must be unique for each task you send to the API.

outputType

"base64Data" | "dataURI" | "URL" Default: URL

Specifies the output type in which the image is returned. Supported values are: dataURI, URL, and base64Data.

  • base64Data: The image is returned as a base64-encoded string using the imageBase64Data parameter in the response object.
  • dataURI: The image is returned as a data URI string using the imageDataURI parameter in the response object.
  • URL: The image is returned as a URL string using the imageURL parameter in the response object.

outputFormat

"JPG" | "PNG" | "WEBP" Default: JPG

Specifies the format of the output image. Supported formats are: PNG, JPG and WEBP.

This parameter allows you to specify a URL to which the generated image will be uploaded as binary image data using the HTTP PUT method. For example, an S3 bucket URL can be used as the upload endpoint.

When the image is ready, it will be uploaded to the specified URL.

checkNSFW

boolean Default: false

This parameter is used to enable or disable the NSFW check. When enabled, the API will check if the image contains NSFW (not safe for work) content. This check is done using a pre-trained model that detects adult content in images.

When the check is enabled, the API will return NSFWContent: true in the response object if the image is flagged as potentially sensitive content. If the image is not flagged, the API will return NSFWContent: false.

If this parameter is not used, the parameter NSFWContent will not be included in the response object.

Adds 0.1 seconds to image inference time and incurs additional costs.

The NSFW filter occasionally returns false positives and very rarely false negatives.

includeCost

boolean Default: false

If set to true, the cost to perform the task will be included in the response object.

positivePrompt

string required

A positive prompt is a text instruction to guide the model on generating the image. It is usually a sentence or a paragraph that provides positive guidance for the task. This parameter is essential to shape the desired results.

For example, if the positive prompt is "dragon drinking coffee", the model will generate an image of a dragon drinking coffee. The more detailed the prompt, the more accurate the results.

The length of the prompt must be between 2 and 2000 characters.

A negative prompt is a text instruction to guide the model on generating the image. It is usually a sentence or a paragraph that provides negative guidance for the task. This parameter helps to avoid certain undesired results.

For example, if the negative prompt is "red dragon, cup", the model will follow the positive prompt but will avoid generating an image of a red dragon or including a cup. The more detailed the prompt, the more accurate the results.

The length of the prompt must be between 2 and 2000 characters.

seedImage

string required

When doing Image-to-Image, Inpainting or Outpainting, this parameter is required.

Specifies the seed image to be used for the diffusion process. The image can be specified in one of the following formats:

  • An UUID v4 string of a previously uploaded image or a generated image.
  • A data URI string representing the image. The data URI must be in the format data:<mediaType>;base64, followed by the base64-encoded image. For example: ....
  • A base64 encoded image without the data URI prefix. For example: iVBORw0KGgo....
  • A URL pointing to the image. The image must be accessible publicly.

Supported formats are: PNG, JPG and WEBP.

maskImage

string required

When doing Inpainting, this parameter is required.

Specifies the mask image to be used for the inpainting process. The image can be specified in one of the following formats:

  • An UUID v4 string of a previously uploaded image or a generated image.
  • A data URI string representing the image. The data URI must be in the format data:<mediaType>;base64, followed by the base64-encoded image. For example: ....
  • A base64 encoded image without the data URI prefix. For example: iVBORw0KGgo....
  • A URL pointing to the image. The image must be accessible publicly.

Supported formats are: PNG, JPG and WEBP.

maskMargin

integer Min: 32 Max: 128

Adds extra context pixels around the masked region during inpainting. When this parameter is present, the model will zoom into the masked area, considering these additional pixels to create more coherent and well-integrated details.

This parameter is particularly effective when used with masks generated by the Image Masking API, enabling enhanced detail generation while maintaining natural integration with the surrounding image.

strength

float Min: 0 Max: 1 Default: 0.8

When doing Image-to-Image or Inpainting, this parameter is used to determine the influence of the seedImage image in the generated output. A lower value results in more influence from the original image, while a higher value allows more creative deviation.

height

integer required Min: 128 Max: 2048

Used to define the height dimension of the generated image. Certain models perform better with specific dimensions.

The value must be divisible by 64, eg: 128...512, 576, 640...2048.

width

integer required Min: 128 Max: 2048

Used to define the width dimension of the generated image. Certain models perform better with specific dimensions.

The value must be divisible by 64, eg: 128...512, 576, 640...2048.

model

string required

We make use of the AIR (Artificial Intelligence Resource) system to identify models. This identifier is a unique string that represents a specific model.

You can find the AIR identifier of the model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.

vae

string

We make use of the AIR (Artificial Intelligence Resource) system to identify VAE models. This identifier is a unique string that represents a specific model.

The VAE (Variational Autoencoder) can be specified to override the default one included with the base model, which can help improve the quality of generated images.

You can find the AIR identifier of the VAE model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.

steps

integer Min: 1 Max: 100 Default: 20

The number of steps is the number of iterations the model will perform to generate the image. The higher the number of steps, the more detailed the image will be. However, increasing the number of steps will also increase the time it takes to generate the image and may not always result in a better image (some schedulers work differently).

When using your own models you can specify a new default value for the number of steps.

scheduler

string Default: Model's scheduler

An scheduler is a component that manages the inference process. Different schedulers can be used to achieve different results like more detailed images, faster inference, or more accurate results.

The default scheduler is the one that the model was trained with, but you can choose a different one to get different results.

Schedulers are explained in more detail in the Schedulers page.

seed

integer Min: 1 Max: 9223372036854776000 Default: Random

A seed is a value used to randomize the image generation. If you want to make images reproducible (generate the same image multiple times), you can use the same seed value.

When requesting multiple images with the same seed, the seed will be incremented by 1 (+1) for each image generated.

CFGScale

float Min: 0 Max: 30 Default: 7

Guidance scale represents how closely the images will resemble the prompt or how much freedom the AI model has. Higher values are closer to the prompt. Low values may reduce the quality of the results.

clipSkip

integer Min: 0 Max: 2

Defines additional layer skips during prompt processing in the CLIP model. Some models already skip layers by default - this parameter adds extra skips on top of those. Different values affect how your prompt is interpreted, which can lead to variations in the generated image.

Defines the syntax to be used for prompt weighting.

Prompt weighting allows you to adjust how strongly different parts of your prompt influence the generated image. Choose between compel notation with advanced weighting operations or sdEmbeds for simple emphasis adjustments.

compel syntax

Adds 0.2 seconds to image inference time and incurs additional costs.

When compel syntax is selected, you can use the following notation in prompts:

Weighting

Syntax: + - (word)0.9

Increase or decrease the attention given to specific words or phrases.

Examples:

  • Single words: small+ dog, pixar style
  • Multiple words: small dog, (pixar style)-
  • Multiple symbols for more effect: small+++ dog, pixar style
  • Nested weighting: (small+ dog)++, pixar style
  • Explicit weight percentage: small dog, (pixar)1.2 style

Blend

Syntax: .blend()

Merge multiple conditioning prompts.

Example: ("small dog", "robot").blend(1, 0.8)

Conjunction

Syntax: .and()

Break a prompt into multiple clauses and pass them separately.

Example: ("small dog", "pixar style").and()


sdEmbeds syntax

When sdEmbeds syntax is selected, you can use the following notation in prompts:

Weighting

Syntax: (text) (text:number) [text]

Use parentheses () to increase attention, square brackets [] to decrease it. Add a number after the text to specify a custom multiplier.

Examples:

  • Single words: (small) dog, pixar style
  • Multiple words: small dog, [pixar style]
  • Higher emphasis: (small:2.5) dog, pixar style
  • Combined emphasis: (small dog:1.5), pixar style

numberResults

integer Default: 1

The number of images to generate from the specified prompt.

If seed is set, it will be incremented by 1 (+1) for each image generated.

refiner

object

Refiner models help create higher quality image outputs by incorporating specialized models designed to enhance image details and overall coherence. This can be particularly useful when you need results with superior quality, photorealism, or specific aesthetic refinements. Note that refiner models are only SDXL based.

The refiner parameter is an object that contains properties defining how the refinement process should be configured. You can find the properties of the refiner object below.

example of Refiner object
[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "model": "string",
    "height": int,
    "width": int,
    "numberResults": int,
    "refiner": { 
      "model": "string",
      "startStep": int 
    } 
  }
]
parameters

model

string required

We make use of the AIR system to identify refinement models. This identifier is a unique string that represents a specific model. Note that refiner models are only SDXL based.

You can find the AIR identifier of the refinement model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.

More information about the AIR system can be found in the Models page.

startStep

integer Min: 2 Max: {steps}

Represents the step number at which the refinement process begins. The initial model will generate the image up to this step, after which the refiner model takes over to enhance the result.

It can take values from 2 (second step) to the number of steps specified.

Alternative parameters: refiner.startStepPercentage.

startStepPercentage

integer Min: 1 Max: 99

Represents the percentage of total steps at which the refinement process begins. The initial model will generate the image up to this percentage of steps before the refiner takes over.

It can take values from 1 to 99.

Alternative parameters: refiner.startStep.

embeddings

object[]

Embeddings (or Textual Inversion) can be used to add specific concepts or styles to your generations. Multiple embeddings can be used at the same time.

The embeddings parameter is an array of objects. Each object contains properties that define which embedding model to use. You can find the properties of the embeddings object below.

example of embeddings array
[
{
  "taskType": "imageInference",
  "taskUUID": "string",
  "positivePrompt": "string",
  "model": "string",
  "height": int,
  "width": int,
  "numberResults": int,
  "embeddings": [ 
    { 
      "model": "string",
    },
    { 
      "model": "string",
    } 
  ] 
}
]
parameters

model

string required

We make use of the AIR system to identify embeddings models. This identifier is a unique string that represents a specific model.

You can find the AIR identifier of the embeddings model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.

weight

float Min: -4 Max: 4 Default: 1

Defines the strength or influence of the embeddings model in the generation process. The value can range from -4 (negative influence) to +4 (maximum influence).

It is possible to use multiple embeddings at the same time.

Example:

"embeddings": [
  { "model": "civitai:1044536@1172007", "weight": 1.5 },
  { "model": "civitai:993446@1113094", "weight": 0.8 }
]

controlNet

object[]

With ControlNet, you can provide a guide image to help the model generate images that align with the desired structure. This guide image can be generated with our ControlNet preprocessing tool, extracting guidance information from an input image. The guide image can be in the form of an edge map, a pose, a depth estimation or any other type of control image that guides the generation process via the ControlNet model.

Multiple ControlNet models can be used at the same time to provide different types of guidance information to the model.

The controlNet parameter is an array of objects. Each object contains properties that define the configuration for a specific ControlNet model. You can find the properties of the ControlNet object below.

example of ControlNet object
[
  {
    "taskType": "imageInference",
    "taskUUID": "string",
    "positivePrompt": "string",
    "model": "string",
    "height": int,
    "width": int,
    "numberResults": int,
    "controlNet": [ 
      { 
        "model": "string",
        "guideImage": "string",
        "weight": float,
        "startStep": int,
        "endStep": int,
        "controlMode": "string"
      },
      { 
        "model": "string",
        "guideImage": "string",
        "weight": float,
        "startStep": int,
        "endStep": int,
        "controlMode": "string"
      } 
    ] 
  }
]
parameters

model

string required

For basic/common ControlNet models, you can check the list of available models here.

For custom or specific ControlNet models, we make use of the AIR system to identify ControlNet models. This identifier is a unique string that represents a specific model.

You can find the AIR identifier of the ControlNet model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.

More information about the AIR system can be found in the Models page.

guideImage

string required

Specifies the preprocessed image to be used as guide to control the image generation process. The image can be specified in one of the following formats:

  • An UUID v4 string of a previously uploaded image or a generated image.
  • A data URI string representing the image. The data URI must be in the format data:<mediaType>;base64, followed by the base64-encoded image. For example: ....
  • A base64 encoded image without the data URI prefix. For example: iVBORw0KGgo....
  • A URL pointing to the image. The image must be accessible publicly.

Supported formats are: PNG, JPG and WEBP.

weight

float Min: 0 Max: 1 Default: 1

Represents the weight (strength) of the ControlNet model in the image.

startStep

integer Min: 1 Max: {steps}

Represents the step number at which the ControlNet model starts to control the inference process.

It can take values from 1 (first step) to the number of steps specified.

Alternative parameters: controlNet.startStepPercentage.

startStepPercentage

integer Min: 0 Max: 99

Represents the percentage of steps at which the ControlNet model starts to control the inference process.

It can take values from 0 to 99.

Alternative parameters: controlNet.startStep.

endStep

integer Min: {startStep + 1} Max: {steps}

Represents the step number at which the ControlNet preprocessor ends to control the inference process.

It can take values higher than startStep and less than or equal to the number of steps specified.

Alternative parameters: controlNet.endStepPercentage.

endStepPercentage

integer Min: {startStepPercentage + 1} Max: 100

Represents the percentage of steps at which the ControlNet model ends to control the inference process.

It can take values higher than startStepPercentage and lower than or equal to 100.

Alternative parameters: controlNet.endStep.

controlMode

string

This parameter has 3 options: prompt, controlnet and balanced.

  • prompt: Prompt is more important in guiding image generation.
  • controlnet: ControlNet is more important in guiding image generation.
  • balanced: Balanced operation of prompt and ControlNet.

lora

object[]

With LoRA (Low-Rank Adaptation), you can adapt a model to specific styles or features by emphasizing particular aspects of the data. This technique enhances the quality and relevance of the generated images and can be especially useful in scenarios where the generated images need to adhere to a specific artistic style or follow particular guidelines.

Multiple LoRA models can be used at the same time to achieve different adaptation goals.

The lora parameter is an array of objects. Each object contains properties that define the configuration for a specific LoRA model. You can find the properties of the LoRA object below.

example of LoRA object
[
{
  "taskType": "imageInference",
  "taskUUID": "string",
  "positivePrompt": "string",
  "model": "string",
  "height": int,
  "width": int,
  "numberResults": int,
  "lora": [ 
    { 
      "model": "string",
      "weight": float 
    },
    { 
      "model": "string",
      "weight": float 
    } 
  ] 
}
]
parameters

model

string required

We make use of the AIR system to identify LoRA models. This identifier is a unique string that represents a specific model.

You can find the AIR identifier of the LoRA model you want to use in our Model Explorer, which is a tool that allows you to search for models based on their characteristics.

More information about the AIR system can be found in the Models page.

Example: civitai:132942@146296.

weight

float Min: -4 Max: 4 Default: 1

Defines the strength or influence of the LoRA model in the generation process. The value can range from -4 (negative influence) to +4 (maximum influence).

It is possible to use multiple LoRAs at the same time.

Example:

"lora": [
  { "model": "runware:13090@1", "weight": 1.5 },
  { "model": "runware:6638@1", "weight": 0.8 }
]

Response

Results will be delivered in the format below. It's possible to receive one or multiple images per message. This is due to the fact that images are generated in parallel, and generation time varies across nodes or the network.

{
  "data": [
    {
      "taskType": "imageInference",
      "taskUUID": "a770f077-f413-47de-9dac-be0b26a35da6",
      "imageUUID": "77da2d99-a6d3-44d9-b8c0-ae9fb06b6200",
      "imageURL": "https://im.runware.ai/image/ws/0.5/ii/a770f077-f413-47de-9dac-be0b26a35da6.jpg",
      "cost": 0.0013
    }
  ]
}

taskType

string

The API will return the taskType you sent in the request. In this case, it will be imageInference. This helps match the responses to the correct task type.

taskUUID

string UUID v4

The API will return the taskUUID you sent in the request. This way you can match the responses to the correct request tasks.

imageUUID

string UUID v4

The unique identifier of the image.

imageURL

string

If outputType is set to URL, this parameter contains the URL of the image to be downloaded.

If outputType is set to base64Data, this parameter contains the base64-encoded image data.

imageDataURI

string

If outputType is set to dataURI, this parameter contains the data URI of the image.

NSFWContent

boolean

If checkNSFW parameter is used, NSFWContent is included informing if the image has been flagged as potentially sensitive content.

  • true indicates the image has been flagged (is a sensitive image).
  • false indicates the image has not been flagged.

The filter occasionally returns false positives and very rarely false negatives.

cost

float

if includeCost is set to true, the response will include a cost field for each task object. This field indicates the cost of the request in USD.