PhotoMaker API
Introduction
PhotoMaker is a powerful image generation technology that enables instant subject personalization without the need for additional training. By providing up to four reference images, you can generate new images that maintain subject fidelity while applying various styles and compositions. This innovative approach offers a perfect balance between preserving the subject's identity and allowing creative freedom in generating new scenarios and styles.
Key features
- Instant Personalization: Transform subjects into new scenes and styles within seconds, with no additional training required.
- High Fidelity: Maintains impressive subject consistency using up to 4 reference images.
- Style Flexibility: Supports various artistic styles while preserving subject identity.
- Text Controllability: Precise control over the final output through detailed text prompts.
- Strength Control: Discover the perfect balance between original subject features and creative transformation.
How it works
PhotoMaker leverages advanced AI technology to analyze up to four reference images of your subject, creating a comprehensive understanding of their key identifying features and characteristics. This initial analysis forms the foundation for maintaining consistent subject appearance throughout the generation process.
Using your provided text prompt as guidance, PhotoMaker then generates new images that seamlessly blend the subject's identity with your desired style settings. The strength parameter allows you to fine-tune this balance.
To achieve optimal results with PhotoMaker:
- Use up to 4 clear reference images of your subject, preferably with good lighting.
- Avoid using heavily filtered or edited images as references, as they may affect the quality of the output.
- Write specific prompts describing desired scene and composition.
- Start with
No style
for maximum subject fidelity, then experiment with different styles and strength values to find the perfect balance.
Request
Our API always accepts an array of objects as input, where each object represents a specific task to be performed. The structure of the object varies depending on the type of the task. For this section, we will focus on the parameters related to PhotoMaker task.
The following JSON snippets shows the basic structure of a request object. All properties are explained in detail in the next section.
-
taskType
-
The type of task to be performed. For this task, the value should be
photoMaker
. -
taskUUID
-
When a task is sent to the API you must include a random UUID v4 string using the
taskUUID
parameter. This string is used to match the async responses to their corresponding tasks.If you send multiple tasks at the same time, the
taskUUID
will help you match the responses to the correct tasks.The
taskUUID
must be unique for each task you send to the API. -
outputType
-
Specifies the output type in which the image is returned. Supported values are:
dataURI
,URL
, andbase64Data
.base64Data
: The image is returned as a base64-encoded string using theimageBase64Data
parameter in the response object.dataURI
: The image is returned as a data URI string using theimageDataURI
parameter in the response object.URL
: The image is returned as a URL string using theimageURL
parameter in the response object.
-
outputFormat
-
Specifies the format of the output image. Supported formats are:
PNG
,JPG
andWEBP
. -
uploadEndpoint
-
This parameter allows you to specify a URL to which the generated image will be uploaded as binary image data using the HTTP PUT method. For example, an S3 bucket URL can be used as the upload endpoint.
When the image is ready, it will be uploaded to the specified URL.
-
checkNSFW
-
This parameter is used to enable or disable the NSFW check. When enabled, the API will check if the image contains NSFW (not safe for work) content. This check is done using a pre-trained model that detects adult content in images.
When the check is enabled, the API will return
NSFWContent: true
in the response object if the image is flagged as potentially sensitive content. If the image is not flagged, the API will returnNSFWContent: false
.If this parameter is not used, the parameter
NSFWContent
will not be included in the response object.Adds 0.1 seconds to image inference time and incurs additional costs.
The NSFW filter occasionally returns false positives and very rarely false negatives.
-
includeCost
-
If set to
true
, the cost to perform the task will be included in the response object. -
inputImages
-
Specifies the input images that will be used as reference for the subject. These reference images help the AI to maintain identity consistency during the generation process. Each input image can be specified in one of the following formats:
- An UUID v4 string of a previously uploaded image or a generated image.
- A data URI string representing the image. The data URI must be in the format
data:<mediaType>;base64,
followed by the base64-encoded image. For example:...
. - A base64 encoded image without the data URI prefix. For example:
iVBORw0KGgo...
. - A URL pointing to the image. The image must be accessible publicly.
Supported formats are: PNG, JPG and WEBP.
-
style
-
Specifies the artistic style to be applied to the generated images. Currently supported values are:
No style
: Maximizes subject fidelity while allowing creative freedom in the composition.Cinematic
: Applies a movie-like aesthetic.Disney Character
: Transforms the subject into a Disney-inspired character.Digital Art
: Creates a digital artwork style.Photographic
: Enhances photographic qualities.Fantasy art
: Applies fantasy-themed artistic elements.Neonpunk
: Creates a neon-colored cyberpunk aesthetic.Enhance
: Improves overall image quality.Comic book
: Transforms the subject into comic book style.Lowpoly
: Creates a low-polygon geometric style.Line art
: Converts the image into line drawing style.
-
strength
-
Controls the balance between preserving the subject's original features and the creative transformation specified in the prompt.
- Lower values provide stronger subject fidelity.
- Higher values allow more creative freedom in the transformation.
-
positivePrompt
-
A positive prompt is a text instruction to guide the model on generating the image. It is usually a sentence or a paragraph that provides positive guidance for the task. This parameter is essential to shape the desired results.
The trigger word
rwre
can be included anywhere in your prompt. If not included, it will be automatically prepended to your prompt. This trigger word is required to activate PhotoMaker's specific features for personalized image generation.Without trigger word:
With trigger word:
The length of the prompt must be between 2 and 2000 characters.
-
negativePrompt
-
A negative prompt is a text instruction to guide the model on generating the image. It is usually a sentence or a paragraph that provides negative guidance for the task. This parameter helps to avoid certain undesired results.
For example, if the negative prompt is "red dragon, cup", the model will follow the positive prompt but will avoid generating an image of a red dragon or including a cup. The more detailed the prompt, the more accurate the results.
The length of the prompt must be between 2 and 2000 characters.
-
height
-
Used to define the height dimension of the generated image. Certain models perform better with specific dimensions.
The value must be divisible by 64, eg: 128...512, 576, 640...2048.
-
width
-
Used to define the width dimension of the generated image. Certain models perform better with specific dimensions.
The value must be divisible by 64, eg: 128...512, 576, 640...2048.
-
model
-
We make use of the AIR (Artificial Intelligence Resource) system to identify models. This identifier is a unique string that represents a specific model.
PhotoMaker requires SDXL-based models. While any SDXL model should work, these models have been thoroughly tested:
AIR ID Name Version civitai:139562@344487
RealVisXL V4.0 (BakedVAE) civitai:101055@128078
SDXL v1.0 VAE fix civitai:112902@126688
DreamShaper XL alpha2 (xl1.0) civitai:133005@288982
Juggernaut XL V 8 + RunDiffusion civitai:133005@782002
Juggernaut XL Jugg_XI_by_RunDiffusion civitai:133005@456194
Juggernaut XL Jugg_X_by RunDiffusion civitai:133005@240840
Juggernaut XL V 7 + RunDiffusion civitai:133005@198530
Juggernaut XL Version 6 + RunDiffusion civitai:133005@348913
Juggernaut XL V9 + RunDiffusionPhoto 2 civitai:152525@293240
Realism Engine SDXL v3.0 VAE civitai:133005@471120
Juggernaut XL Jugg_X_RunDiffusion_Hyper You can find additional SDXL models in our Model Explorer which is a tool that allows you to search for models based on their characteristics.
-
steps
-
The number of steps is the number of iterations the model will perform to generate the image. The higher the number of steps, the more detailed the image will be. However, increasing the number of steps will also increase the time it takes to generate the image and may not always result in a better image (some schedulers work differently).
When using your own models you can specify a new default value for the number of steps.
-
scheduler
-
An scheduler is a component that manages the inference process. Different schedulers can be used to achieve different results like more detailed images, faster inference, or more accurate results.
The default scheduler is the one that the model was trained with, but you can choose a different one to get different results.
Schedulers are explained in more detail in the Schedulers page.
-
seed
-
A seed is a value used to randomize the image generation. If you want to make images reproducible (generate the same image multiple times), you can use the same seed value.
When requesting multiple images with the same seed, the seed will be incremented by
1
(+1) for each image generated. -
CFGScale
-
Guidance scale represents how closely the images will resemble the prompt or how much freedom the AI model has. Higher values are closer to the prompt. Low values may reduce the quality of the results.
-
clipSkip
-
Defines additional layer skips during prompt processing in the CLIP model. Some models already skip layers by default - this parameter adds extra skips on top of those. Different values affect how your prompt is interpreted, which can lead to variations in the generated image.
-
numberResults
-
The number of images to generate from the specified prompt.
If seed is set, it will be incremented by
1
(+1) for each image generated.
Response
Results will be delivered in the format below. It's possible to receive one or multiple images per message. This is due to the fact that images are generated in parallel, and generation time varies across nodes or the network.
-
taskType
-
The API will return the
taskType
you sent in the request. In this case, it will bephotoMaker
. This helps match the responses to the correct task type. -
taskUUID
-
The API will return the
taskUUID
you sent in the request. This way you can match the responses to the correct request tasks. -
inputImagesUUID
-
An array containing the unique identifiers of the original images used as input for the PhotoMaker task.
-
imageUUID
-
The unique identifier of the image.
-
imageURL
-
If
outputType
is set toURL
, this parameter contains the URL of the image to be downloaded. -
imageBase64Data
-
If
outputType
is set tobase64Data
, this parameter contains the base64-encoded image data. -
imageDataURI
-
If
outputType
is set todataURI
, this parameter contains the data URI of the image. -
seed
-
The seed value used for generating the image. Even when not specified in the request (random seed), this value is returned to allow reproducing the same image in future requests.
If multiple images were generated (
numberResults
> 1), each image will have its own seed value, incremented by 1 from the initial seed. -
NSFWContent
-
If checkNSFW parameter is used,
NSFWContent
is included informing if the image has been flagged as potentially sensitive content.true
indicates the image has been flagged (is a sensitive image).false
indicates the image has not been flagged.
The filter occasionally returns false positives and very rarely false negatives.
-
cost
-
if
includeCost
is set totrue
, the response will include acost
field for each task object. This field indicates the cost of the request in USD.