Exactly Illustrative Training
Exactly Illustrative Training is a model training workflow for building custom illustrative models from a company's dataset. It fine-tunes the Exactly Illustrative architecture to capture the style of brand assets, then produces a trained model that can be used for consistent text-to-image and image-to-image generation on the platform.
Complete technical specification for integration
Step-by-step tutorials for advanced use cases
← All GuidesTraining a custom style model
How to train a custom Exactly Illustrative model on a brand's visual style. Covers dataset curation, the training API call, status polling, and using the trained model for consistent text-to-image and image-to-image generation.
Introduction
Exactly Illustrative Training fine-tunes an image model from a small set of reference images. You feed it 10 to 50 images that share a visual style, the model learns that style, and afterwards you generate new imagery in the same look from any prompt. Most teams reach for this when they need brand-consistent illustrations at scale: a content marketing pipeline, a product surface that ships a lot of asset variations, a partner program where the output has to look like it came from the same designer every time.
A lone astronomical observatory dome silhouetted on a mountain ridge under a sweeping Milky Way arc, deep navy sky scattered with stars, warm gold glow spilling from the open dome slit, faint engraved star map overlay across the upper sky
The worked example throughout this guide is Stellar, a fictional AI stargazing app. Its visual identity is editorial cosmic illustration: deep navy and dusky purple skies, warm gold accents, fine astronomical-engraving line work, slight grain. The walkthrough trains a model on a small dataset that exemplifies that style, then uses the trained model to render new subjects in the same look.
Curating a style dataset
The dataset is the lever. Every other parameter in this workflow is mechanical. The dataset is where you decide what the model learns. Two principles do most of the work.
Hold the style constant. Vary the subject. Every image in your dataset should look like it was produced by the same hand on the same brief. Color treatment, line work, lighting, composition language, level of detail, level of abstraction. Pick the visual axes you want the model to learn, and make sure every image exemplifies them. Then make the subjects as varied as you can: different scenes, different objects, different framings. You're trying to teach the model that "the style" is the constant signal across your dataset, and that everything else is variation. If half your dataset is portraits, the model may conclude that the style includes "is a portrait" and you'll get unwanted bias.
Trim outliers. A single image that breaks the visual language is worth removing even if it's your favorite. The model averages across the dataset, and outliers pull the average in a direction you don't want.
Here's the Stellar training dataset, ten images that share one visual language across deliberately varied subjects (a telescope, a planet diagram, a constellation map, a phone mockup, a Milky Way view, etc.):
A vintage brass telescope on a tripod pointing up at a starry sky from a grassy hilltop, fine etched detail on the telescope body, editorial cosmic illustration...
An open star chart card on a wooden surface with hand-drawn constellation lines connecting dots, with the "STELLAR" wordmark printed across the top...
A panoramic view of the Milky Way arching across the night sky above a low desert horizon...
A domed astronomical observatory silhouetted against a deep starry sky, light glowing from its open dome slit...
A detailed close-up illustration of the full moon with its craters, set against deep space...
A vintage brass astrolabe instrument sitting on a wooden desk next to an open notebook with handwritten diagrams...
A small human figure silhouetted from behind, looking up at a sky filled with shooting stars and faint nebulae...
A side-view scientific diagram of the planet Saturn with its rings, labeled in vintage botanical-illustration style...
A celestial map of the constellation Orion with thin gold lines connecting its stars, faint labels in small serif type...
A smartphone mockup standing upright on a desk, showing the "STELLAR" app interface with a star map on screen...
The subjects span instruments, sky views, scientific diagrams, even a UI mockup. The style is fixed across all ten: same palette, same engraving-influenced line work, same grain texture, same dreamy mood. The model will learn that consistent part as "Stellar style" and treat the varying subjects as proof that the style is decoupled from any one of them.
How the Stellar dataset was made. The ten images above were generated with Recraft V4.1 , using its color palette feature to lock the deep navy and warm gold tones across all ten prompts. Same palette plus varied prompts gave us style-consistent variety, which is exactly the kind of input a training dataset needs.
Dataset format requirements
The training endpoint expects a single ZIP file containing your images, supplied as inputs.dataset via a public URL. Hard requirements from the schema:
- 10 to 50 images. Below 10 won't accept, above 50 won't accept either.
- JPEG, PNG, or WebP. Images are converted to JPEG internally.
- Max 50 MB per image. Files are automatically downscaled to 4096 pixels on the long side, so you don't need to pre-resize, but you do need to keep individual files under the size cap.
Fewer high-quality images beat more weak ones. The upper bound of 50 is useful when you have a deep brand library. For a tightly-defined style, ten carefully-chosen images is often enough.
Submitting the training job
Training is an asynchronous task. Submit it once, then poll for completion. The request:
[
{
"taskType": "training",
"taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"model": "exactly:illustrative@training",
"inputs": {
"dataset": "https://example.com/stellar-dataset.zip"
},
"importModel": {
"air": "yourorg:exactly-illustrative@stellar",
"name": "Stellar",
"shortDescription": "Editorial cosmic illustration in the Stellar brand style",
"version": "1.0.0",
"private": true
}
}
]{
"data": [
{
"taskType": "training",
"taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
}
]
}The inputs.dataset is the ZIP, hosted at a public URL the platform can fetch.
The importModel block embeds a Model Upload task into the training request. The fields are the same ones the standalone task accepts, but bundled here so the platform trains the model and registers it under your chosen AIR (yourorg:exactly-illustrative@stellar in this example) in one round trip rather than two. After training finishes, the model at that AIR becomes live and usable in any standard imageInference request. The private flag controls whether it's visible only to your account or surfaced platform-wide. Set it to true while you iterate.
The AIR follows the format provider:model@version, where the provider is your organization namespace. Pick something stable like yourorg:brand-illustration@1. If you later retrain with an improved dataset, bump to @2 rather than overwriting. Keeping versions lets you compare and roll back.
Tracking progress
Training takes about two hours end to end. The submission returns an immediate acknowledgment with the taskUUID. You poll for status by submitting a getResponse task with that UUID:
[
{
"taskType": "getResponse",
"taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
}
]{
"data": [
{
"taskType": "training",
"taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "processing",
"progress": 47
}
]
}{
"data": [
{
"taskType": "training",
"taskUUID": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"status": "success",
"air": "yourorg:exactly-illustrative@stellar"
}
]
}status moves from processing to either success or error. Polling every 5 to 10 minutes is plenty given the ~2 hour wall time. On success, the model at your reserved AIR is live. On error, the response includes an error object with code and message fields you can act on (most common cause: a dataset that didn't meet the format or count requirements).
Generating with the trained model
Once training is complete, the trained model behaves like any other image-generation model on the platform. Pass its AIR in the model field of a regular imageInference request, and prompt for the subject you want, without restating the style cues. The model has the style baked in, and restating them often makes the output feel overcooked.
[
{
"taskType": "imageInference",
"taskUUID": "b2c3d4e5-f6a7-8901-bcde-f23456789012",
"model": "yourorg:exactly-illustrative@stellar",
"positivePrompt": "A red fox sitting on a mossy rock at dusk, gazing upward at the aurora",
"width": 1024,
"height": 1024
}
]None of these subjects or formats appeared in the Stellar training set. The mix below pairs scenic illustrations with a product mockup and an editorial banner to show that the learned style holds across both decorative and functional outputs:
A wide editorial blog header with the title "BEYOND THE VEIL" in large gold serif typography centered on the left half, a small engraved illustration of an observatory dome and night sky on the right half, deep navy background with scattered stars, fine line work, banner layout
A vintage brass compass resting on an open hand-drawn star chart, warm candlelight reflecting off the compass glass
A mountain range with three snow-capped peaks viewed from a distance
A dark navy ceramic mug on a wooden desk, printed with a gold constellation map wrapping around the surface and the word STELLAR in small serif type near the rim, steam rising from the mug, soft warm light
A tall Victorian clock tower rising against a deep starry sky, warm light glowing from the clock face, iron filigree details on the tower
A red fox sitting on a mossy rock at dusk, gazing upward at a sky filled with faint aurora streaks and scattered stars
Each prompt is a plain subject description with no style cues. The deep navy and warm gold palette, the engraving line work, the dreamy mood, the grain texture, all come from the model.
The trained model accepts the same parameter surface as any other Exactly Illustrative model: a positivePrompt, a width and height between 1024 and 2048 (in multiples of 64), and optionally a single inputs.referenceImages entry for image-to-image (type sketch with adjustable strength, or type reference for style guidance). The settings.quality parameter toggles between the default pipeline ("low") and the high-fidelity pipeline ("high"). Pass the AIR in the model field, and that's the only change from generating with the base Exactly Illustrative models.
Replace yourorg:exactly-illustrative@stellar with the AIR from your own training response. The request is otherwise ready to run:
[
{
"taskType": "imageInference",
"taskUUID": "c3d4e5f6-a7b8-9012-cdef-345678901234",
"model": "yourorg:exactly-illustrative@stellar",
"positivePrompt": "A vintage compass resting on an open star chart, soft candlelight",
"width": 1536,
"height": 1536,
"settings": {
"quality": "high"
}
}
]Tips
A first training rarely produces the final model. Treat it as a loop: train, evaluate on subjects outside the training set, identify what the model didn't learn (or learned wrong), adjust the dataset, retrain.
-
Test on subjects not in your dataset. Generate 10 to 20 images on subjects deliberately absent from your training set. If the style transfers cleanly, the dataset taught the right thing. If the model only produces good results for subjects similar to those it saw, your dataset is too subject-narrow. Widen the subject variety while holding the style constant.
-
Watch for accidental consistencies. If every image in your dataset happens to share something beyond the style (most are landscapes, most have a centered composition, most are low-key lit), the model will absorb that as part of the style. Audit the dataset against your style brief and remove anything that introduces an unintended pattern.
-
Version your datasets. Keep the ZIP and the prompts that generated or curated it under version control alongside your code. Bump the
importModel.versionwhen you retrain. This lets you A/B old versus new on the same prompts and gives you a clean rollback path. -
Prefer ten strong images over fifty mediocre ones. The upper limit exists for cases where your style genuinely has 50 distinct exemplars. Most brands don't. Adding noise to reach the cap will reduce, not improve, output consistency.
-
Quote in-brand wordmarks if you need them. When you want the brand name to appear in the rendered image (on signage, packaging, or a UI mockup), wrap it in straight quotes inside the prompt:
a storefront sign reading "STELLAR". The quotes tell the model to render the exact string rather than paraphrase it, and the trained model picks up the style automatically. -
Use
settings.quality: "high"for final output. The high-fidelity pipeline produces noticeably better results at the cost of longer generation times. Default to"low"while iterating on prompts, then switch to"high"for the assets you ship.