Image Masking API

Generate precise masks automatically for faces, hands, and people using AI detection. Enhance your inpainting workflow with smart, automated masking features.

Introduction

The Image Masking API provides intelligent detection and mask generation for specific elements in images, particularly optimized for faces, hands, and people. Built on advanced detection models, this feature enhances the inpainting workflow by automatically creating precise masks around detected elements, enabling targeted enhancement and detailing.

How it works

The masks generated by this API can be utilized in two powerful inpainting workflows, offering both convenience and advanced detail enhancement capabilities.

In the standard inpainting workflow, these automatically generated masks eliminate the need for manual mask creation. Once the mask is generated by selecting the appropriate detection model (face, hand, or person), it can be directly used in an inpainting request. This automation is particularly valuable for batch processing or when consistent mask creation is needed across multiple images.

However, the most powerful application comes when using these masks for detail enhancement through the inpainting process's zooming capability. In this workflow, after the mask is automatically generated around detected elements (like faces or hands), the inpainting model will zoom into the masked area when the maskMargin parameter is present. This parameter is crucial as it adds extra context pixels around the masked region. For example, if you're enhancing a face, `maskMargin ensures the model can see enough of the surrounding area to create coherent and well-integrated details.

The full process typically follows these steps:

Submit the original image to the image masking API, specifying the desired detection model.
Receive a precise mask identifying the target area (face, hand, or person).
Use both the original image and the generated mask in an inpainting request, including the maskMargin parameter to enable zooming.
The inpainting model will zoom into every masked region, considering the extra context area specified by maskMargin. This allows the model to add finer details to the masked areas and blend them smoothly into your original image for a natural, refined look.

This combination of automatic masking and zoom-enabled inpainting is particularly effective for enhancing specific features while maintaining natural integration with the surrounding image context.

Request

Our API always accepts an array of objects as input, where each object represents a specific task to be performed. The structure of the object varies depending on the type of the task. For this section, we will focus on the parameters related to image masking task.

The following JSON snippets shows the basic structure of a request object. All properties are explained in detail in the next section.

[
  {
    "taskType": "imageMasking",
    "taskUUID": "string",
    "inputImage": "string",
    "model": "string",
    "confidence": float,
    "maxDetections": int,
    "maskPadding": int,
    "maskBlur": int,
    "outputFormat": "string",
    "outputType": "string"
  }
]

taskType string required: The type of task to be performed. For this task, the value should be imageMasking.

taskUUID string required UUID v4

When a task is sent to the API you must include a random UUID v4 string using the taskUUID parameter. This string is used to match the async responses to their corresponding tasks.

If you send multiple tasks at the same time, the taskUUID will help you match the responses to the correct tasks.

The taskUUID must be unique for each task you send to the API.

outputType "base64Data" | "dataURI" | "URL" Default: URL

Specifies the output type in which the image is returned. Supported values are: dataURI, URL, and base64Data.

base64Data: The image is returned as a base64-encoded string using the maskImageBase64Data parameter in the response object.
dataURI: The image is returned as a data URI string using the maskImageDataURI parameter in the response object.
URL: The image is returned as a URL string using the maskImageURL parameter in the response object.

outputFormat "JPG" | "PNG" | "WEBP" Default: JPG: Specifies the format of the output image. Supported formats are: PNG, JPG and WEBP.

outputQuality integer Min: 20 Max: 99 Default: 95: Sets the compression quality of the output image. Higher values preserve more quality but increase file size, lower values reduce file size but decrease quality.

uploadEndpoint string

Specifies a URL where the generated content will be automatically uploaded using the HTTP PUT method. The raw binary data of the image or video file is sent directly as the request body (not as JSON), enabling webhook-like functionality for immediate processing once generation is complete.

Common use cases:

Cloud storage: Upload directly to S3 buckets, Google Cloud Storage, or Azure Blob Storage.
Webhook services: Send content to your API endpoints or serverless functions for processing.
CDN integration: Upload to content delivery networks for immediate distribution.

URL customization:

You can include query parameters in the URL to help identify and process the uploaded content:

// Using our task UUIDs
https://your-api.com/webhook/media?taskUUID=991e641a-d2a8-4aa3-9883-9d6fe230fff8

// Using your own IDs
https://api.example.com/receive-content?id=img_abc123&projectId=proj_789

// Direct cloud storage upload
https://your-bucket.s3.amazonaws.com/generated/content.mp4

// Multiple parameters for tracking
https://api.example.com/process-media?userId=12345&timestamp=1640995200

The content data will be sent as the request body, allowing your endpoint to receive and process the generated image or video immediately upon completion.

includeCost boolean Default: false: If set to true, the cost to perform the task will be included in the response object.

inputImage string required

Specifies the input image to be processed for mask generation. The generated mask will identify specific elements in the image (faces, hands, or people) based on the selected detection model. The input image can be specified in one of the following formats:

An UUID v4 string of a previously uploaded image or a generated image.
A data URI string representing the image. The data URI must be in the format data:<mediaType>;base64, followed by the base64-encoded image. For example: data:image/png;base64,iVBORw0KGgo....
A base64 encoded image without the data URI prefix. For example: iVBORw0KGgo....
A URL pointing to the image. The image must be accessible publicly.

Supported formats are: PNG, JPG and WEBP.

model string required

Specifies the specialized detection model to use for mask generation

Each model is optimized for specific use cases and offers different trade-offs between speed and accuracy.

View full face detection models

AIR ID	Model Name	Description
`runware:35@1`	face_yolov8n	Lightweight model for 2D/realistic face detection.
`runware:35@2`	face_yolov8s	Enhanced face detection with improved accuracy.
`runware:35@6`	mediapipe_face_full	Specialized for realistic face detection.
`runware:35@7`	mediapipe_face_short	Optimized face detection with reduced complexity.
`runware:35@8`	mediapipe_face_mesh	Advanced face detection with mesh mapping.

View facial features models

AIR ID	Model Name	Description
`runware:35@9`	mediapipe_face_mesh_eyes_only	Focused detection of eye regions.
`runware:35@15`	eyes_mesh_mediapipe	Specialized eyes detection.
`runware:35@13`	nose_mesh_mediapipe	Specialized nose detection.
`runware:35@14`	lips_mesh_mediapipe	Specialized lips detection.
`runware:35@10`	eyes_lips_mesh	Detection of eyes and lips areas.
`runware:35@11`	nose_eyes_mesh	Detection of nose and eyes areas.
`runware:35@12`	nose_lips_mesh	Detection of nose and lips areas.

View other body parts models

AIR ID	Model Name	Description
`runware:35@3`	hand_yolov8n	Specialized for 2D/realistic hand detection.
`runware:35@4`	person_yolov8n-seg	Person detection and segmentation.
`runware:35@5`	person_yolov8s-seg	Advanced person detection with higher precision.

Learn more ⁨1⁩ resource

Inpainting: Selective image editing
GUIDE

confidence float Min: 0 Max: 1 Default: 0.25

Confidence threshold for detections. Only detections with confidence scores above this threshold will be included in the mask.

Lower confidence values will detect more objects but may introduce false positives.

maxDetections integer Min: 1 Max: 20 Default: 6: Limits the maximum number of elements (faces, hands, or people) that will be detected and masked in the image. If there are more elements than this value, only the ones with highest confidence scores will be included.

maskPadding integer Min: 0 Default: 4: Extends or reduces the detected mask area by the specified number of pixels. Positive values create a larger masked region (useful when you want to ensure complete coverage of the element), while negative values shrink the mask (useful for tighter, more focused areas).

maskBlur integer Min: 0 Default: 4

Extends the mask by the specified number of pixels with a gradual fade-out effect, creating smooth transitions between masked and unmasked regions in the final result.

Note: The blur is always applied to the outer edge of the mask, regardless of whether maskPadding is used.

Response

Results will be delivered in the format below. It's possible to receive one or multiple images per message. This is due to the fact that images are generated in parallel, and generation time varies across nodes or the network.

{
  "data": [
    {
      "taskType": "imageMasking",
      "taskUUID": "d06e972d-dbfe-47d5-955f-c26e00ce4960",
      "imageUUID": "90422a52-f186-4bf4-a73b-0a46016a8330",
      "detections": [
        {
          "x_min": 505,
          "y_min": 237,
          "x_max": 588,
          "y_max": 337
        }
      ],
      "maskImageURL": "https://im.runware.ai/image/ws/0.5/ii/a770f077-f413-47de-9dac-be0b26a35da6.jpg",
      "cost": 0.0013
    }
  ]
}

taskType string: The API will return the taskType you sent in the request. In this case, it will be imageMasking. This helps match the responses to the correct task type.

taskUUID string UUID v4: The API will return the taskUUID you sent in the request. This way you can match the responses to the correct request tasks.

inputImageUUID string UUID v4: The unique identifier of the original image used as input for the masking task.

detections array

An array of objects containing the coordinates of each detected element in the image. Each object provides the bounding box coordinates of a detected face, hand, or person (depending on the model used).

Each detection object includes:

x_min: Leftmost coordinate of the detected area
y_min: Topmost coordinate of the detected area
x_max: Rightmost coordinate of the detected area
y_max: Bottommost coordinate of the detected area

These coordinates can be useful for further processing or for understanding the exact location of detected elements in the image.

maskImageUUID string UUID v4: The unique identifier of the mask image.

maskImageURL string: If outputType is set to URL, this parameter contains the URL of the mask image to be downloaded.

maskImageBase64Data string: If outputType is set to base64Data, this parameter contains the base64-encoded data of the mask image.

maskImageDataURI string: If outputType is set to dataURI, this parameter contains the data URI of the mask image.

cost float: if includeCost is set to true, the response will include a cost field for each task object. This field indicates the cost of the request in USD.