Black Forest Labs
Black Forest Labs

FLUX Virtual Try-On

Low-latency virtual try-on for transferring garments onto a person image with strong identity and garment fidelity

Image to ImageEdit

FLUX Virtual Try-On Overview

FLUX Virtual Try-On is a virtual try-on image editing model from Black Forest Labs that generates apparel try-on results from a person image plus one or more garment references. It is tuned to preserve the subject's face and pose while transferring garments with strong logo, print, stitching, and hardware fidelity, making it suitable for catalog-scale styling, product visualization, outfit transfer, and shopper-facing try-on workflows. It supports multi-garment composition, seeded generation, and output sizes up to 2 megapixels.

From $0.0425/ image
1 ref input image$0.0425
2 ref input images$0.0475

Commercial use

How to Use FLUX Virtual Try-On

Overview

FLUX Virtual Try-On is a dedicated apparel try-on model that generates a styled result from a person image plus one or more garment references.

It is built for workflows where the goal is not broad prompt-driven image editing, but realistic garment transfer that preserves the subject's face and pose while keeping product details intact.

Strengths

Identity and Pose Preservation

FLUX Virtual Try-On is designed to keep the subject's face, pose, and overall body presentation stable while changing what they are wearing. This makes it useful for shopper-facing try-on flows and catalog-style visualization where continuity matters.

Strong Garment Fidelity

The model is tuned to retain logos, prints, stitching, hardware, and other product-defining details. That makes it a better fit than generic editing models when the garment itself needs to remain recognizable and brand-accurate.

Multi-Garment Composition

The workflow supports applying more than one garment to the same subject. This makes it possible to transfer partial outfits, combine layers, or build a full styled look from garment references.

Low-Latency Editing Workflow

The model is positioned for production try-on experiences where speed matters. It is suitable for high-volume catalog and shopper-interaction workflows rather than one-off manual composites.

Capabilities

Person-and-Garment Try-On

FLUX Virtual Try-On accepts a person image and a garment reference, then generates an edited result that places the garment on the subject.

Outfit Transfer and Layering

The model can be used for single-piece try-on as well as multi-garment styling, including layered combinations assembled from reference assets.

Prompt-Guided Styling Control

The API includes a natural-language prompt that helps describe the garment category, fit, and drape while keeping the person's identity and pose fixed.

Seeded Generation

The model supports a seed parameter for more reproducible iterations when refining the same try-on setup.

Input and Output

  • AIR ID: bfl:flux@vto
  • Input: one person image, one or more garment references, and a styling prompt
  • Output: edited try-on image
  • Output cap: up to 2 megapixels
  • Controls: width, height, steps, seed, output format

Best Fit

  • Apparel try-on experiences
  • Catalog-scale product visualization
  • Outfit transfer and styling previews
  • Shopper-facing retail workflows
  • Brand-sensitive garment presentation

More models from Black Forest Labs

FLUX Erase is a dedicated image editing model from Black Forest Labs for removing unwanted objects, text, or distractions from an image while reconstructing the surrounding content naturally. It is tuned for erase-style workflows where the goal is not to replace an element with something new, but to leave a clean, coherent result that matches nearby structure, lighting, texture, and scene context.

FLUX Outpainting

Api Only

FLUX Outpainting is a dedicated image expansion model from Black Forest Labs that extends an existing image beyond its original borders in a single call. It is tuned to continue scene content, lighting, texture, and composition naturally without requiring a text prompt, making it useful for aspect-ratio changes, social reformats, banner layouts, and other canvas-extension workflows.

FLUX.2 [klein] 9B KV is a KV-cache optimized variant of the Klein 9B model that caches reference image key-value pairs after the first denoising step, skipping redundant computation on subsequent steps. This delivers up to 2.5x faster inference for multi-reference editing tasks while retaining all capabilities of the standard Klein 9B, including sub-second text-to-image and advanced editing in 4 steps.

FLUX.2 [klein] 9B is a 4-step distilled image generation and editing model designed for sub-second inference without sacrificing visual quality. It unifies text-to-image and advanced editing workflows in a single model, making it suitable for interactive applications, real-time previews, and latency-critical production use.

FLUX.2 [klein] 4B is a 4-step distilled image generation and editing model optimized for ultra-low latency inference. It delivers near real-time performance with strong visual quality, enabling interactive workflows and responsive production systems on more constrained hardware.

FLUX.2 [klein] 9B Base is the undistilled foundation model of the Klein family, offering full model capacity for image generation and editing. It is optimized for fine-tuning, customization, and post-training workflows where flexibility, control, and maximum training signal are required.

FLUX.2 [klein] 4B Base is a compact undistilled image generation and editing model with an exceptional quality-to-size ratio. It is well suited for local deployment, efficient fine-tuning, and custom pipelines that require flexibility on limited hardware.

FLUX.2 [max] is a high-precision text to image and image editing model from Black Forest Labs that generates visuals grounded in real-time information via live web search. It delivers maximum prompt adherence with multi-reference editing and state-of-the-art consistency across identities, objects, and details.

FLUX.2 [pro] is a flow-matching latent transformer for precise text-to-image synthesis and reference-guided editing. It supports multi image references, 4MP outputs, and Mistral-based text conditioning for controllable composition and robust iterative edits that preserve structure.

FLUX.2 [flex] is a configurable text to image and image editing model built for precise text placement and stable layouts. It exposes sampling and guidance controls and supports up to ten reference images for consistent characters or products across complex compositions.

FLUX.2 dev is an open weight text to image and image editing model from Black Forest Labs. It targets developers who need precise control over prompts, references, and iteration. Use it for non commercial research, workflow prototyping, and multi conditioning image pipelines.

FLUX.1 Krea [dev] is an open-weight text-to-image model from Black Forest Labs and Krea AI. It targets opinionated aesthetics and realistic photography. Developers can drop it into FLUX.1 dev workflows to build custom generators that avoid the typical AI look.

FLUX.1 Kontext [max] is a high quality text to image model for production workflows. It focuses on prompt accuracy, sharp local edits, and premium typography rendering. Use it for detailed visual design, branded visuals, and consistent character safe image generation.

FLUX.1 Kontext [pro] combines fast text to image generation with precise image editing. It supports reference images, local region edits, and full scene changes while preserving style and character identity. Ideal for iterative workflows in design, product visuals, and storytelling pipelines.

FLUX.1 Kontext [dev] is an open weights image editing model tuned for fast iterative development. It supports local edits and full scene changes from prompts. Use it for style transfer, background swaps, object edits, and character consistent variations in your pipelines.

FLUX.1 Fill [dev] is an open image editing model for inpainting and outpainting with text guidance and masks. It can replace objects, extend scenes, and adjust regions while preserving context. Ideal for pipelines that need controllable edits on real or generated images.

FLUX.1 Depth [dev] uses input depth maps as structural guidance for new image generation. It preserves scene geometry and layout while allowing creative control over style and details. Ideal for developers who need stable composition in controlled image synthesis.

FLUX.1 Expand [pro] is an outpainting model that extends images beyond their original frame while preserving structure, lighting, and style. It supports controlled expansion from real or generated inputs and integrates into image editing or generative workflows that need precise, coherent borders.

FLUX.1.1 [pro] Ultra is a high resolution text to image model from Black Forest Labs. It generates images up to 4 megapixels in about 10 seconds. Ultra mode targets sharp outputs. Raw mode targets natural photographic style. Built for API integration in real products.

FLUX.1 Canny [dev] is a 12B parameter rectified flow transformer for image generation. It takes a text prompt and an input image. It extracts canny edges as structural guidance. It then generates new images that follow the original composition while applying the prompt.

FLUX.1 Fill Pro provides advanced inpainting and outpainting for real and generated images. Supply an input image, mask, and text prompt. The model fills or extends regions with seamless content that matches context and style. Ideal for edits, layout fixes, and content-aware expansion.

FLUX.1.1 Pro is a flagship text to image model from Black Forest Labs. It improves on FLUX.1 with sharper detail, stronger prompt adherence, and faster sampling. Ideal for production image pipelines, product visuals, and creative tools that require consistent high quality output.

FLUX.1 [schnell] is an open source text to image model from Black Forest Labs. It uses 4 step distillation for very fast generation with strong visual quality. Ideal for local deployment, rapid prototyping, batch image production, and integration into custom creative pipelines.

FLUX.1 [dev] is a 12B parameter text to image model from Black Forest Labs. It targets high fidelity visual generation for research and non commercial use. Developers can build image apps that need strong prompt following and fine visual detail at high resolution.