Image Masking
Introduction
The Image Masking API provides intelligent detection and mask generation for specific elements in images, particularly optimized for faces, hands, and people. Built on advanced detection models, this feature enhances the inpainting workflow by automatically creating precise masks around detected elements, enabling targeted enhancement and detailing.
How it works
The masks generated by this API can be utilized in two powerful inpainting workflows, offering both convenience and advanced detail enhancement capabilities.
In the standard inpainting workflow, these automatically generated masks eliminate the need for manual mask creation. Once the mask is generated by selecting the appropriate detection model (face, hand, or person), it can be directly used in an inpainting request. This automation is particularly valuable for batch processing or when consistent mask creation is needed across multiple images.
However, the most powerful application comes when using these masks for detail enhancement through the inpainting process's zooming capability. In this workflow, after the mask is automatically generated around detected elements (like faces or hands), the inpainting model will zoom into the masked area when the maskMargin parameter is present. This parameter is crucial as it adds extra context pixels around the masked region. For example, if you're enhancing a face, `maskMargin ensures the model can see enough of the surrounding area to create coherent and well-integrated details.
The full process typically follows these steps:
- Submit the original image to the image masking API, specifying the desired detection model.
- Receive a precise mask identifying the target area (face, hand, or person).
- Use both the original image and the generated mask in an inpainting request, including the
maskMargin
parameter to enable zooming. - The inpainting model will zoom into every masked region, considering the extra context area specified by
maskMargin
. This allows the model to add finer details to the masked areas and blend them smoothly into your original image for a natural, refined look.
This combination of automatic masking and zoom-enabled inpainting is particularly effective for enhancing specific features while maintaining natural integration with the surrounding image context.
Request
Our API always accepts an array of objects as input, where each object represents a specific task to be performed. The structure of the object varies depending on the type of the task. For this section, we will focus on the parameters related to image masking task.
The following JSON snippets shows the basic structure of a request object. All properties are explained in detail in the next section.
-
taskType
-
The type of task to be performed. For this task, the value should be
imageMasking
. -
taskUUID
-
When a task is sent to the API you must include a random UUID v4 string using the
taskUUID
parameter. This string is used to match the async responses to their corresponding tasks.If you send multiple tasks at the same time, the
taskUUID
will help you match the responses to the correct tasks.The
taskUUID
must be unique for each task you send to the API. -
outputType
-
Specifies the output type in which the image is returned. Supported values are:
dataURI
,URL
, andbase64Data
.base64Data
: The image is returned as a base64-encoded string using themaskImageBase64Data
parameter in the response object.dataURI
: The image is returned as a data URI string using themaskImageDataURI
parameter in the response object.URL
: The image is returned as a URL string using themaskImageURL
parameter in the response object.
-
outputFormat
-
Specifies the format of the output image. Supported formats are:
PNG
,JPG
andWEBP
. -
uploadEndpoint
-
This parameter allows you to specify a URL to which the generated image will be uploaded as binary image data using the HTTP PUT method. For example, an S3 bucket URL can be used as the upload endpoint.
When the image is ready, it will be uploaded to the specified URL.
-
includeCost
-
If set to
true
, the cost to perform the task will be included in the response object. -
inputImage
-
Specifies the input image to be processed for mask generation. The generated mask will identify specific elements in the image (faces, hands, or people) based on the selected detection model. The input image can be specified in one of the following formats:
- An UUID v4 string of a previously uploaded image or a generated image.
- A data URI string representing the image. The data URI must be in the format
data:<mediaType>;base64,
followed by the base64-encoded image. For example:data:image/png;base64,iVBORw0KGgo...
. - A base64 encoded image without the data URI prefix. For example:
iVBORw0KGgo...
. - A URL pointing to the image. The image must be accessible publicly.
Supported formats are: PNG, JPG and WEBP.
-
model
-
Specifies the specialized detection model to use for mask generation. Currently supported models:
Full face detection:
runware:35@1
(face_yolov8n): Lightweight model for 2D/realistic face detection.runware:35@2
(face_yolov8s): Enhanced face detection with improved accuracy.runware:35@6
(mediapipe_face_full): Specialized for realistic face detection.runware:35@7
(mediapipe_face_short): Optimized face detection with reduced complexity.runware:35@8
(mediapipe_face_mesh): Advanced face detection with mesh mapping.
Facial features:
runware:35@9
(mediapipe_face_mesh_eyes_only): Focused detection of eye regions.runware:35@15
(eyes_mesh_mediapipe): Specialized eyes detection.runware:35@13
(nose_mesh_mediapipe): Specialized nose detection.runware:35@14
(lips_mesh_mediapipe): Specialized lips detection.runware:35@10
(eyes_lips_mesh): Detection of eyes and lips areas.runware:35@11
(nose_eyes_mesh): Detection of nose and eyes areas.runware:35@12
(nose_lips_mesh): Detection of nose and lips areas.
Other body parts:
runware:35@3
(hand_yolov8n): Specialized for 2D/realistic hand detection.runware:35@4
(person_yolov8n-seg): Person detection and segmentation.runware:35@5
(person_yolov8s-seg): Advanced person detection with higher precision.
Each model is optimized for specific use cases and offers different trade-offs between speed and accuracy.
-
confidence
-
Confidence threshold for detections. Only detections with confidence scores above this threshold will be included in the mask.
Lower confidence values will detect more objects but may introduce false positives.
-
maxDetections
-
Limits the maximum number of elements (faces, hands, or people) that will be detected and masked in the image. If there are more elements than this value, only the ones with highest confidence scores will be included.
-
maskPadding
-
Extends or reduces the detected mask area by the specified number of pixels. Positive values create a larger masked region (useful when you want to ensure complete coverage of the element), while negative values shrink the mask (useful for tighter, more focused areas).
-
maskBlur
-
Extends the mask by the specified number of pixels with a gradual fade-out effect, creating smooth transitions between masked and unmasked regions in the final result.
Note: The blur is always applied to the outer edge of the mask, regardless of whether
maskPadding
is used.
Response
Results will be delivered in the format below. It's possible to receive one or multiple images per message. This is due to the fact that images are generated in parallel, and generation time varies across nodes or the network.
-
taskType
-
The API will return the
taskType
you sent in the request. In this case, it will beimageMasking
. This helps match the responses to the correct task type. -
taskUUID
-
The API will return the
taskUUID
you sent in the request. This way you can match the responses to the correct request tasks. -
inputImageUUID
-
The unique identifier of the original image used as input for the masking task.
-
detections
-
An array of objects containing the coordinates of each detected element in the image. Each object provides the bounding box coordinates of a detected face, hand, or person (depending on the model used).
Each detection object includes:
x_min
: Leftmost coordinate of the detected areay_min
: Topmost coordinate of the detected areax_max
: Rightmost coordinate of the detected areay_max
: Bottommost coordinate of the detected area
These coordinates can be useful for further processing or for understanding the exact location of detected elements in the image.
-
maskImageUUID
-
The unique identifier of the mask image.
-
maskImageURL
-
If
outputType
is set toURL
, this parameter contains the URL of the mask image to be downloaded. -
maskImageBase64Data
-
If
outputType
is set tobase64Data
, this parameter contains the base64-encoded data of the mask image. -
maskImageDataURI
-
If
outputType
is set todataURI
, this parameter contains the data URI of the mask image. -
cost
-
if
includeCost
is set totrue
, the response will include acost
field for each task object. This field indicates the cost of the request in USD.