Changelog

The latest updates and improvements to the Runware platform.

FLUX Virtual Try-On

MODEL

FLUX Virtual Try-On is a virtual try-on image editing model for transferring garments onto a person image.

Accepts a person image plus one or more garment references to generate photorealistic try-on outputs.
Works with packshots, model shots, and multi-garment compositions.
Preserves the subject's face and pose while maintaining strong garment fidelity for logos, prints, stitching, and hardware.
Built for fashion, e-commerce, and branded content workflows.

Krea 2 Family

MODEL

Two Krea 2 image models are now available for controllable generation and editing with weighted reference guidance.

Krea 2 Large

Higher-capacity variant for stronger photorealism, creative control, and more flexible output character.
Supports text-to-image and image-to-image workflows with up to 10 weighted reference images.
Better suited to workflows that benefit from motion blur, film grain, low dynamic range, and other less polished looks.

Krea 2 Medium

Faster, more stable, and more cost-efficient variant for illustration, anime, painting, and graphic design workflows.
Supports text-to-image and image-to-image workflows with up to 10 weighted reference images.
Includes creativity controls for prompt interpretation strength and style-guided generation.

FLUX Erase

MODEL

FLUX Erase is a dedicated image editing model for clean, context-aware object removal.

Removes unwanted objects, text, or distractions using mask-driven editing.
Reconstructs surrounding content to match nearby structure, lighting, texture, and scene context.
Built for image cleanup, privacy edits, and scene simplification without complex prompting.

DeepSeek-V4-Flash

MODEL

DeepSeek-V4-Flash is a fast frontier language model for coding, reasoning, and long-context agent workflows.

Supports both thinking and non-thinking modes for lower-latency and reasoning-heavy tasks.
1 million token context window with up to 384,000 output tokens.
Supports tool calls and JSON output for structured API workflows and agent orchestration.
Built for coding assistants, long-document analysis, large-repository reasoning, and cost-sensitive production LLM workloads.

FLUX Outpainting

MODEL

FLUX Outpainting is a dedicated image expansion model for extending images beyond their original frame.

Expands an existing image beyond its original borders in a single call.
Continues scene content, lighting, texture, and composition naturally without requiring a text prompt.
Supports exact target dimensions for aspect-ratio changes, banners, and social reformats.

ImagineArt 2.0 Image Editing

FEATURE

ImagineArt 2.0 now supports prompt-guided image editing through the API.

Edit images with text instructions and up to 4 reference images.
Uses reasoning-based generation to interpret input images before producing edited outputs.
Supports style transfer, lighting-aware scene changes, and background swaps.
Edited outputs support up to 1.5K resolution across 15 aspect ratios.

Gemini 3.5 Flash

MODEL

Gemini 3.5 Flash is Google's frontier multimodal reasoning model for agentic and coding workflows.

Supports text, image, video, audio, and PDF inputs in a single model.
1 million token input limit with up to 65,000 output tokens.
Built for long-horizon workflows, sub-agent orchestration, and complex coding loops.
Supports tool use including search, code execution, and function calling.

HeyGen Avatar V

MODEL

HeyGen Avatar V is a high-fidelity avatar video model for talking digital twins and other eligible registered avatar looks.

Improves identity preservation, lip sync accuracy, facial expressiveness, and motion coherence across angle changes and long-form videos.
Supports 500 pre-built avatars for studio-quality avatar video generation.
Suited for marketing, training, sales, education, and localization workflows where avatar stability matters.

Veo 3.1 Fast Reference-to-Video

FEATURE

Veo 3.1 Fast now supports reference images for more consistent subject, product, and character shots.

Pass reference images to guide subject, product, or character consistency.
Generates up to 8 seconds at up to 2160p (4K) in 16:9 or 9:16 formats.
Optional native audio support.
Built for product shots, ads, and branded content workflows.

Recraft V4.1 Family

MODEL

Four Recraft V4.1 models are now available for design-focused image generation, editing, and commercial asset production.

Recraft V4.1

Sharper photorealism, better object understanding, and cleaner results from shorter prompts.
Improved gradients, 3D rendering, and cleaner icon and vector-style output by default.

Recraft V4.1 Pro

Higher-resolution variant for larger, more polished, and print-ready final assets.
Shares the same V4.1 improvements in photorealism, short-prompt behavior, and design quality.

Recraft V4.1 Utility

Controlled-output variant for flatter lighting, front-facing composition, and simpler scenes.
Built for product shots, icons, e-commerce imagery, and other predictable commercial visuals.

Recraft V4.1 Utility Pro

Higher-resolution Utility variant for polished product and e-commerce assets.
Tuned for predictable commercial imagery where consistency matters more than creative variation.

GPT-5.5

MODEL

GPT-5.5 is OpenAI's frontier reasoning model for complex coding, long-context work, and tool-using professional tasks.

Supports text and image inputs in a single model.
1,050,000 token context window with up to 128,000 output tokens.
Configurable reasoning effort with support for web search, file search, code interpreter, hosted shell, apply patch, MCP, and computer use.

Kimi K2.6

MODEL

Kimi K2.6 is Moonshot AI's flagship multimodal LLM for coding, reasoning, and agentic execution.

Supports text, image, and video understanding in a single model.
Built for long-horizon software tasks, reliable tool use, and autonomous multi-step workflows.
Designed for coordinated agent workflows and multimodal reasoning across complex tasks.

Claude Opus 4.7, Sonnet 4.6 & Haiku 4.5

MODEL

Three Anthropic Claude models are now available on Runware.

Claude Opus 4.7

Frontier multimodal reasoning model for deep reasoning, advanced coding, and long-running agent workflows.
Built for complex knowledge work with stronger instruction following and high-resolution vision support.
Supports a 1M-token context window.

Claude Sonnet 4.6

General-purpose multimodal model for daily production use across coding, agents, computer use, and long-context reasoning.
Supports adaptive and extended thinking for high-volume automation and professional knowledge work.
Supports a 1M-token context window in beta.

Claude Haiku 4.5

Fastest and most cost-efficient Claude model for low-latency and high-throughput workloads.
Built for scaled agents, sub-agent orchestration, coding assistance, and budget-conscious deployments.

Inworld Realtime TTS-2

MODEL

Inworld Realtime TTS-2 is a low-latency text-to-speech model built for expressive realtime voice interaction.

Supports 133 voices across 14 languages.
Free-form voice direction allows natural control over delivery style.
Audio-aware delivery carries tone and pacing forward within a session for more natural results.

Grok Imagine Image Quality

MODEL

Grok Imagine Image Quality is xAI's quality-focused image generation and editing model for higher realism and tighter prompt control.

Strong text rendering with high realism in complex scenes.
Handles charts, infographics, diagrams, and UI concepts with tighter prompt following.
Supports editing existing images with text instructions.

Grok Imagine Image Pro Deprecated

DEPRECATION

xAI is deprecating Grok Imagine Image Pro. This model will be fully deactivated on May 15, 2026.

If you are using this model, please migrate to the recommended replacement before the deactivation date.

Suggested Replacements

Grok Imagine Image Pro (xai:grok-imagine@image-pro) → Grok Imagine Image Quality (xai:grok-imagine@image-quality)

MFA, Social Login & Enterprise SSO

FEATURE

Runware now supports MFA for individual accounts and workspaces, alongside Social Login and Enterprise SSO options for access control.

Social Login supports Google and GitHub, and can upgrade an existing password account when the email address matches.
Enterprise SSO supports SAML and OIDC providers, guided setup for common providers, and workspace access restrictions based on the configured identity provider. Contact Sales if your organization needs access.
MFA supports SMS and TOTP authenticator apps, and workspaces can require MFA for all users.
MFA is not compatible with Enterprise SSO, since SSO providers can enforce MFA directly for a single login flow.

UNI-1 & UNI-1 Max

MODEL

Two Luma UNI-1 image models are now available for image creation, precision editing, and multi-reference control.

UNI-1

Unified image generation and editing with reasoning-informed visual generation.
Supports multi-reference guidance with strong control over composition, identity, style, and visual plausibility.
Built for prompt-driven creation and source-image modification in a single model.

UNI-1 Max

Quality-first variant of UNI-1 for higher-detail and more polished final outputs.
Uses the same generation and editing workflows as UNI-1 with a stronger focus on final-image quality.

Mirelo SFX 1.5

MODEL

Mirelo SFX 1.5 converts silent video into synchronized, context-aware sound effects.

Adds footsteps, impacts, ambience, and cinematic effects based on the visual scene.
Maintains tight audio sync even in fast or complex motion sequences.
Built for fast, robust, high-quality video sound design workflows.

PixVerse Modify

MODEL

PixVerse Modify is a video-to-video editing model for prompt-driven changes to existing footage.

Swap subjects, age characters, and add or remove objects.
Change weather, lighting, or time of day, and replace on-screen text.
Supports full-video style transfer while preserving the underlying structure of the source footage.

P-Video-Avatar

MODEL

P-Video-Avatar is a portrait-driven avatar video model for script- or audio-controlled talking-head generation.

Supports 30 voices across multiple languages.
Generates avatar videos at 720p or 1080p with durations up to one minute.
Avatar movement, background, and speaking tone can be controlled through prompting.

DALL·E 2 & DALL·E 3 Deprecated

DEPRECATION

OpenAI is deprecating DALL·E 2 and DALL·E 3. These models will be fully deactivated on May 12, 2026.

If you are using these models, please migrate to the recommended replacement before the deactivation date.

Suggested Replacements

DALL·E 2 (openai:2@2) → GPT Image 2 (openai:gpt-image@2)
DALL·E 3 (openai:2@3) → GPT Image 2 (openai:gpt-image@2)

Juggernaut Z

MODEL

Juggernaut Z is a fine-tuned image model built on Z Image Base for more presentation-ready output.

Improves lighting, depth of field, skin texture, and demographic balance.
Reduces prompting overhead for usable photorealistic results.
Compatible with existing LoRAs and Z Image workflows.

HappyHorse-1.0

MODEL

HappyHorse-1.0 is a video generation model for short-form text-to-video and image-to-video workflows.

Generates up to 15 seconds of video at up to 1080p resolution.
Supports synchronized audio generation in a single pass.
Available across 7 languages.

SkyReels V4

MODEL

SkyReels V4 is a video generation and editing model built for precise, element-level control.

Edit specific parts of a video while preserving overall motion and scene flow.
Supports image-to-video workflows with character consistency across shots.
Omni References can tag inputs to control exactly what gets changed or replaced in the output.
Outputs up to 1080p at 32 FPS for videos up to 15 seconds.

Imagen 3 & 4 Variants Deprecated

DEPRECATION

Google Imagen 3, Imagen 3 Fast, Imagen 4 Fast, Imagen 4 Preview, and Imagen 4 Ultra are deprecated. These models will be fully deactivated on June 30, 2026. New and existing integrations should migrate to the recommended replacement model.

Suggested Replacements

Imagen 3 (google:1@1) → Nano Banana (google:4@1)
Imagen 3 Fast (google:1@2) → Nano Banana (google:4@1)
Imagen 4 Fast (google:2@3) → Nano Banana (google:4@1)
Imagen 4 Preview (google:2@1) → Nano Banana (google:4@1)
Imagen 4 Ultra (google:2@2) → Nano Banana (google:4@1)

Kling VIDEO 3.0 4K & O3 4K

MODEL

Two Kling 4K video models are now available for native 4K generation without upscaling or post-processing workarounds.

Kling VIDEO 3.0 4K

Native 4K text-to-video and image-to-video generation.
Multi-shot video generation with native audio and stronger visual detail.
Supports video durations from 3 to 15 seconds.

Kling VIDEO O3 4K

Native 4K text-to-video and image-to-video generation.
Preserves the O3 line's reference-guided creation and prompt-based editing workflows.
Supports video durations from 3 to 15 seconds.

Seedance 2.0 1080p Support

FEATURE

Seedance 2.0 now supports 1080p video generation for higher-resolution production workflows.

Full HD output with sharper detail and cleaner motion.
Suited for campaign, social, and trailer deliverables.
Character consistency workflows can use reference images generated with Seedream 5.0 Lite.

Veo 2, Veo 3 & Veo 3 Fast Deprecated

DEPRECATION

Google is deprecating Veo 2, Veo 3, and Veo 3 Fast. These models will be fully deactivated on June 30, 2026.

If you are using these models, please migrate to the recommended replacements before the deactivation date.

Suggested Replacements

Veo 2 (google:2@0) → Veo 3.1 (google:3@2)
Veo 3 (google:3@0) → Veo 3.1 (google:3@2)
Veo 3 Fast (google:3@1) → Veo 3.1 Fast (google:3@3)

GPT Image 2

MODEL

GPT Image 2 is OpenAI's state-of-the-art image generation and editing model for fast, high-quality output.

Supports text-to-image generation and image editing in a single model.
Handles flexible image sizes and high-fidelity image inputs.
Built for strong prompt adherence, readable embedded text, and layout-aware visual output.

Gemini 3.1 Flash TTS

MODEL

Gemini 3.1 Flash TTS is a low-latency text-to-speech model for real-time conversation and narration.

Supports single-speaker generation and two-speaker dialogue output.
Prosody can be controlled with style tags such as [laughs] and [short pause], as well as temperature.
Includes 30 voices with multilingual output.

Qwen3-TTS 1.7B Variants

MODEL

Three Qwen3-TTS 1.7B models are now available for multilingual text-to-speech generation with different voice control workflows.

Base

Supports voice cloning from a reference audio sample.

CustomVoice

Provides preset voices with optional style hints for more controlled delivery.

VoiceDesign

Generates custom voices from a text description.

Tripo 3D v3.1

MODEL

Tripo 3D v3.1 is a high-fidelity 3D generation model for production-ready assets from text prompts or images.

Supports text-to-3D and image-to-3D workflows.
Improved geometry precision and texture quality compared to earlier versions.
Supports PBR materials and export in GLB, FBX, OBJ, and STL formats.

ImagineArt 2.0

MODEL

ImagineArt 2.0 is a reasoning-based text-to-image model for instruction-faithful generation across realistic and stylized outputs.

Uses reasoning-based generation for more accurate and nuanced prompt following.
Supports a wide range of styles, from photorealistic imagery to illustration and anime.
New color pipeline improves vibrancy and true-to-life color rendering.

Hunyuan 3D 3.1 Pro & Rapid

MODEL

Two Hunyuan 3D 3.1 models are now available for high-fidelity and rapid 3D asset generation from text prompts or images.

Hunyuan 3D 3.1 Pro

Production-grade 3D generation with enhanced 8-view reconstruction.
Produces clean topology, watertight meshes, and PBR materials.
Suited for game development, 3D printing, and professional animation workflows.

Hunyuan 3D 3.1 Rapid

Speed-optimized 3D generation for fast prototyping and iteration.
Generates clean meshes with PBR materials from text prompts or single images.
Designed for concept evaluation and early-stage asset workflows where speed matters more than maximum detail.

MiniMax Music 2.6 & Music Cover

MODEL

Two MiniMax music models are now available for song generation and melody-preserving audio transformation.

MiniMax Music 2.6

Generates full songs and instrumentals from short prompts or detailed production-style instructions.
Supports prompt-level control over key, emotional arc, and song structure, including sections such as intro, verse, chorus, and bridge.
Returns the first audio clip in under 20 seconds for faster iteration.

MiniMax Music Cover

Reworks an uploaded track in a new style while preserving the original melody.
Supports prompt-controlled changes to genre, instrumentation, arrangement, and vocal character.

PixVerse v6

MODEL

PixVerse v6 is now available for end-to-end video generation with native audio output.

Generate video with dialogue, sound effects, and music in a single pass.
Up to 15 seconds at up to 1080p.
Improved consistency for characters and smoother transitions across shots, with directed camera motion.
Currently ranked #4 for Image-to-Video on Artificial Analysis.

sync-3

MODEL

sync-3 is a lip sync model for full-shot generation with native 4K and super resolution support.

Processes the entire shot in one pass instead of chunking, improving consistency across frames.
Handles close-ups, extreme face angles, partially obscured faces, and obstructed mouths.
Preserves voice, tone, and emotional expression across 95+ languages.

Meshy-6

MODEL

Meshy-6 is a 3D generation model for cleaner geometry and production-oriented mesh control.

Supports text-to-3D and image-to-3D generation.
Configurable quad or triangle topology with polycount and symmetry control.
Supports low-poly workflows, A/T pose control, and textured outputs with optional PBR materials.

GPT-5.4, 5.4 Mini & 5.4 Nano

MODEL

Three new OpenAI GPT-5.4 multimodal models for advanced reasoning, coding workflows, and high-volume automation.

GPT-5.4

Flagship reasoning model with a 1 million token context window.
Integrates native computer use and coding capabilities with configurable reasoning effort for complex tasks.
Improved factual accuracy suitable for production-grade writing, analysis, and execution.

GPT-5.4 Mini

Efficient, lower-cost variant built for coding assistants and subagent orchestration.
Supports a 400K token context window with native computer use and faster responsiveness.
Balances high-quality text and image understanding for mid-scale workloads.

GPT-5.4 Nano

Ultra-low-latency, lightweight model for high-throughput classification, data extraction, and ranking.
Optimized for maximum speed and cost efficiency in high-volume, continuous workflows.

Rodin Gen-2

MODEL

Rodin Gen-2 is a 3D generation model for production-ready assets from text prompts or multiple input images.

Supports text-to-3D and image-to-3D workflows with up to 5 input images.
Generates quad-dominant or raw meshes with configurable polygon counts.
Supports PBR and shaded materials with export in GLB, FBX, OBJ, USDZ, and STL formats.

ACE-Step v1.5

MODEL

ACE-Step v1.5 is an open-source music generation model with voice cloning, lyric editing, and multilingual support.

Generates full tracks from text prompts with support for voice cloning, remixing, and cover generation.
Text-to-audio and audio-to-audio workflows across a wide range of genres.
Compositions up to 10 minutes with support for over 50 languages.
Turbo variant available for faster generation.

Exactly.ai Model Family

MODEL

16 new Exactly.ai models for style-consistent image generation across illustration and photography.

10 illustrative styles and 6 photographic styles, each trained with a specific artist for consistent output across generations.
Outputs stay visually coherent without drifting between styles across batches.
Suited for creative agencies, marketing teams, and content platforms generating images at scale with a consistent visual identity.

Wan2.7 Video

MODEL

Wan2.7 is Alibaba's most capable video model, supporting text-to-video, image-to-video, reference-to-video, and instruction-based video editing.

Instruction-based video editing to add, remove, or swap elements, change backgrounds, and restyle scenes from text.
Reuse motion sequences, camera movements, and visual styles from reference videos.
Lock character appearance and voice across shots with up to 5 subject references and 40+ facial expressions.
Output at 720p or 1080p, 30fps with native audio and auto-dubbing support.

P-Image Upscale

MODEL

P-Image Upscale is an AI-powered image upscaling model that enhances resolution while preserving detail and realism.

Upscale images to a target resolution between 1 and 8 megapixels.
Optional detail enhancement to sharpen textures and fine elements.
Realism enhancement to improve photographic quality and depth.
Sub-second inference for fast, scalable upscaling workflows.

Sora 2 & Sora 2 Pro Deprecated

DEPRECATION

OpenAI is deprecating Sora 2 and Sora 2 Pro. These models will be fully deactivated on September 24, 2026.

If you are using these models, please migrate to an alternative video generation model before the deactivation date.

Veo 3.1 Lite

MODEL

Veo 3.1 Lite is the most cost-effective model in the Veo family, matching Veo 3.1 Fast speed at roughly half the price.

Text-to-video, image-to-video, and reference-to-video with 4s, 6s, and 8s output durations.
Native audio generation with dialogue, sound effects, and ambient audio synchronized with video.
Supports 16:9 and 9:16 aspect ratios at up to 1080p resolution.

Wan2.7 Image

MODEL

Wan2.7 Image is a unified image generation and editing model with improved human figures, text rendering, and color output.

Sharper human figure generation with more realistic proportions and detail.
Improved multilingual text rendering supporting up to 3,000 tokens in 12 languages.
More vibrant, saturated color output compared to previous Wan versions.
Supports text-to-image, image-to-image, and region-based editing in a single model.
Pro variant available for higher-fidelity output.

Topaz Labs Starlight Precise 2.5

MODEL

Topaz Labs Starlight Precise 2.5 is a diffusion-based video upscaling model with high-fidelity detail preservation.

Upscale video to 1080p or 4K with fine texture and edge retention other models soften.
Full temporal consistency across frames with natural skin textures and faces.
Suited for film restoration, content upscaling pipelines, and production-grade video enhancement.

FLUX.2 [klein] 9B KV

MODEL

FLUX.2 [klein] 9B KV is a KV-cache optimized variant of Klein 9B with up to 2.5x faster multi-reference editing.

All capabilities of FLUX.2 [klein] 9B: sub-second generation, text-to-image, and multi-reference editing in a single model.
KV-cache support delivers up to 2.5x faster inference for multi-reference editing workflows.
Built for interactive apps and real-time editing pipelines.

xAI Text-to-Speech

MODEL

xAI Text-to-Speech is an expressive, low-latency text-to-speech model with multilingual support.

Natural speech synthesis with inline controls for pauses, laughter, whispers, and emphasis.
5 distinct voices (Ara, Eve, Leo, Rex, Sal) across 20+ languages with automatic language detection.
Low-latency synthesis suited for voice agents, real-time applications, and multilingual content at scale.

HeyGen Video Agent & Avatar IV

MODEL

Two HeyGen models for end-to-end AI video creation and photorealistic avatar generation.

HeyGen Video Agent

Prompt-to-video pipeline with automated scripting, voiceover, avatar selection, B-roll, and editing.
Designed for marketing, sales, training, and social content at scale.

HeyGen Avatar IV

Photorealistic avatars with micro-expressions, natural gestures, and audio-driven emotion.
Granular expressiveness controls with faster generation times.
Suited for enterprise training, multilingual content, and scalable avatar-based video.

Seedance 1.0 Lite & Seedream 3.0 Deprecated

DEPRECATION

Bytedance is deprecating Seedance 1.0 Lite and Seedream 3.0. These models will be fully deactivated on May 13, 2026, with resources being gradually reduced starting from March 23rd.

If you are using these models, please migrate to the recommended replacements before the deactivation date.

Suggested Replacements

Seedance 1.0 Lite (bytedance:1@1) → Seedance 1.0 Pro Fast (bytedance:2@2)
Seedream 3.0 (bytedance:3@1) → Seedream 4.0 (bytedance:5@0)

TRELLIS.2 & SAM 3D Objects

MODEL

Two new 3D generation models for creating production-ready assets from single images.

TRELLIS.2

High-resolution 3D asset generation from a single 2D image input.
Produces fully textured, production-ready meshes.
Ideal for game assets, VFX, product design, and architectural visualization.

SAM 3D Objects

Reconstructs full 3D geometry, realistic textures, and scene layout from a single RGB image.
High-fidelity mesh output even in cluttered, occluded real-world scenes.
Suited for AR/VR asset creation, e-commerce product visualization, and scene understanding.

Gemini 3.1 Pro, 3.1 Flash Lite & 3 Flash

MODEL

Three Gemini LLM models from Google are now available on Runware.

Gemini 3.1 Pro

Google's most capable reasoning model with 77.1% on ARC-AGI-2 and 94.3% on GPQA Diamond.
Natively multimodal with text, image, audio, video, and code inputs.
Up to 2 million token context window for processing entire codebases, books, or hours of video.
Configurable thinking levels for balancing reasoning depth against cost and speed.
Enhanced agentic workflows for autonomous development and long-horizon tasks.

Gemini 3.1 Flash Lite

Optimized for high-volume, low-latency workloads at a fraction of the cost of Pro.
2.5x faster time-to-first-token and over 360 tokens per second output speed.
Up to 1 million token context window with multimodal input support.
Configurable thinking levels for fine-grained control over reasoning depth.

Gemini 3 Flash

Balanced speed and capability with Gemini 3 Pro-level reasoning at Flash efficiency.
200K token context window with text, image, audio, video, and code inputs.
78% on SWE-bench Verified for strong coding performance across large codebases.

MiniMax M2.7 & M2.5 LLM

MODEL

MiniMax M2.7 and M2.5 are the first LLM models available on Runware, built for complex agentic workflows and real-world productivity.

MiniMax M2.7

Near Opus-level software engineering with 56.22% on SWE-Pro.
97% skill adherence rate across 40+ complex, long-horizon tasks.
Self-evolving architecture with 30% performance gains across 100+ reinforcement learning iterations.
Full end-to-end project delivery including debugging, code security, and refactoring.
Highspeed variant available for lower latency.

MiniMax M2.5

80.2% on SWE-Bench Verified with strong agentic performance beyond code generation.
200K context window with built-in thinking mode.
Optimized for speed and cost across multi-step tool use tasks.

Grok Imagine Image Pro

MODEL

Grok Imagine Image Pro is xAI's most capable image generation model with higher quality, greater prompt adherence, and finer detail.

Photorealistic and stylized image generation from text prompts.
Native 2K resolution output for print, product visuals, and professional creative pipelines.
Multi-turn iterative refinement with natural language instructions.

Inworld TTS-1.5

MODEL

Inworld TTS 1.5 is a low-latency text-to-speech model available in Mini and Max variants, built for real-time AI applications at scale.

Production-grade latency with P90 time-to-first-audio under 250ms (Max) and under 130ms (Mini).
15+ languages supported.
Ideal for conversational AI agents, live translation, interactive media, and accessibility tools.

Kling VIDEO 3.0 Motion Control

FEATURE

Kling VIDEO 3.0 now supports Motion Control with facial identity preservation and emotion reproduction.

Consistent facial identity preserved across angles and through occlusions.
Complex emotion reproduction with high-fidelity restoration.

LTX-2.3

MODEL

LTX-2.3 is a video generation model with a rebuilt VAE, stronger prompt handling, and improved audio-video sync.

Rebuilt VAE for improved textures, edge detail, and on-screen text clarity.
4× larger text connector for complex multi-subject prompts with reduced drift.
Frame rate options at 24, 25, 48, and 50fps.
Significantly reduced audio artifacts with tighter audio-video sync.
Fast variant available for lower-latency generation.

Qwen-Image-2.0 and Qwen-Image-2.0-Pro

MODEL

Qwen-Image-2.0 and Qwen-Image-2.0-Pro are unified text-to-image and image editing models with native 2K resolution.

Unified generation and editing in a single model.
Native 2K resolution with fine detail in skin, fabric, and architecture.
Professional typography rendering supporting up to 1,000-token prompts for infographics, posters, slides, and comics.

Kling IMAGE 3.0 and IMAGE O3

MODEL

Two new Kling image models for professional-grade generation and high-fidelity reference-based workflows.

Kling IMAGE 3.0

Native 2K resolution with enhanced texture, lighting, and material handling.
Image-to-image editing with consistency across iterative subject and layout refinements.

Kling IMAGE O3

High-fidelity generation at up to 4K resolution.
Supports up to 10 reference images.
Face and identity preservation for character consistency.

MiniMax Speech 2.8

MODEL

MiniMax Speech 2.8 is a text-to-speech model with consistent tone across long-form scripts.

Consistent voice quality across extended scripts without tonal drift.
Low latency suitable for real-time applications and voice agents.
Turbo variant available for faster, lower-cost generation.

Nano Banana 2

MODEL

Nano Banana 2 delivers improved reasoning, rendering, and layout control for text-to-image generation.

Stronger prompt adherence with real-world scene reasoning and stable multi-subject composition.
State-of-the-art text rendering for posters, UI, and packaging.
Improved material and lighting realism.
Up to 4K output.

P-Video

MODEL

P-Video is an image-to-video model optimized for fast iteration with a draft-to-final rendering workflow.

Draft mode generates preview clips at lower cost, with seed reuse for full-quality final renders.
Strong input-image consistency with reliable lip sync and native dialogue.
High-quality subject and background rendering, including low-resolution asset animation.
Up to 1080p resolution at 48fps.

Bria Fibo Edit Tools

MODEL

Bria FIBO Edit Tools is a structured image editing model with optional mask-based targeting for localized changes.

Preserves original content outside the edit region.
Supports recoloring, relighting, restoration, image blending, seasonal effects, and sketch generation.

Seedream 5.0 Lite

MODEL

Seedream 5.0 Lite is a text-to-image and image-to-image model with native 4K output and integrated web search.

High-detail image generation up to 4K resolution.
Strong text rendering for posters, UI mockups, and branded assets.
Consistent multi-image outputs with stable character and layout coherence across batches.
Integrates live web search to reflect current events and trending topics in generations.

Kling VIDEO O3

MODEL

Kling VIDEO O3 is now available on the platform in both Standard and Pro variants.

Unified text-to-video, image-to-video, and video-to-video workflows.
Generates HD video with synchronised native audio.
Supports reference-driven generation for stronger subject control.
Includes prompt-based video editing with strong temporal consistency to maintain subject stability across clips.

Recraft V4 and V4 Pro

MODEL

Recraft V4 and V4 Pro are professional-grade text-to-image models built for brand and campaign work.

Photorealistic and illustration output with structured composition.
Native 2K generation with style and color control.

Kling VIDEO 3.0

MODEL

Kling VIDEO 3.0 is available in Standard and Pro variants for cinematic multi-shot video generation with native audio.

Text-to-video and image-to-video support.
Multi-shot generation with manual or automatic scene planning.
Cross-shot consistency for characters and voices across scene cuts.
Multi-language audio with lip sync.
Up to 15 seconds with professional-grade resolution and reduced artifacts.
Native text rendering for signage, captions, and advertising.
Audio available as an optional add-on for both variants.

P-Image LoRA and P-Image-Edit LoRA

FEATURE

LoRA support is now available for P-Image and P-Image-Edit.

Apply trained or community LoRAs for custom styles, characters, and concepts.
Compatible with any HuggingFace LoRA without additional setup.
P-Image-Edit LoRA supports style transfers and targeted image transformations.

Wan2.6 Flash Reference-to-Video

MODEL

Wan2.6 Flash R2V generates video from short reference clips while preserving visual identity and voice.

Automatic lip sync and voice replication from the reference clip.
Available at 720p and 1080p with optional audio output.
Flash variant optimized for faster inference.

GLM-Image

MODEL

GLM-Image is an open-source 16B parameter hybrid autoregressive and diffusion model.

Unified text-to-image and image editing in a single model.
Industry-leading text rendering accuracy with strong prompt adherence across photorealistic and stylistic outputs.

Z-Image

MODEL

Z-Image is an open-source 6B parameter diffusion transformer for text-to-image and image-to-image generation.

High visual fidelity with strong prompt adherence across diverse styles.
Foundation model for distilled variants including Z-Image Turbo.

Vidu Q3 Turbo

MODEL

Vidu Q3 Turbo is a faster variant of Vidu Q3 optimized for rapid iteration.

Up to 16 seconds at native 1080p with first and last frame support.
Native audio-video sync with dialogue, ambient sounds, and BGM generated in a single pass.
Available at 540p, 720p, and 1080p output.

Grok Imagine Image & Video

MODEL

Two new Grok Imagine models for photorealistic image generation and short-form video.

Grok Imagine Image

Photorealistic image generation with text-to-image and image-to-image support.

Grok Imagine Video

Short-form text-to-video and image-to-video generation.
Dynamic motion and camera behavior with native audio output.
Output available at 480p and 720p.

Runway Gen-4.5

MODEL

Runway Gen-4.5 delivers unprecedented physical accuracy and expressive character generation.

Realistic weight, momentum, and fluid dynamics.
Expressive characters with nuanced emotions and natural gestures.
Strong prompt adherence for complex camera choreography and scene composition.
Text-to-video and image-to-video supported.

LTX-2 19B

MODEL

LTX-2 19B is a production-ready model with synchronized audio generation.

Native 4K resolution at 50fps.
Up to 20 seconds duration.
Prompt enhancer enabled by default to improve video quality.

Reve Image

MODEL

Reve Image is an all-in-one image model for generating, editing, and remixing visuals.

Supports style variation and layout changes.
Handles semantic edits without rebuilding from scratch.
Designed for iterative visual workflows.

Riverflow 2.0

MODEL

Riverflow 2.0 offers reference-based super resolution and custom font control.

Reference-based super resolution with high accuracy for product and marketing images.
Custom font control with reliable text rendering.
Consistent outputs with images checked and corrected before return.

Sora 2 Remix Capability

FEATURE

Remix capability allows you to refine and modify existing videos without regenerating from scratch.

Preserves original structure and camera movement for precise control.
Suitable for color grading, style transformation, and element replacement.

Vidu Q3

MODEL

Vidu Q3 generates synchronized video and audio from text or images.

Sound effects, dialogue, and subtitles in a single output — no stitching required.
Multi-shot support with automatic shot planning or prompt-based control.
Up to 16 seconds at 1080p with stable, vivid results.
Works from text prompts or image inputs.

Seedance 1.5 Pro Draft Mode

FEATURE

Draft mode allows you to validate prompt intent and motion logic before full video generation.

Save up to 40% compared to normal 480p generation.
Fix prompts before compute-heavy rendering.
Iterate faster and cheaper.

PixVerse v5.6

MODEL

PixVerse v5.6 delivers more reliable single-shot video generation with improved audio.

More reliable single-shot image-to-video and text-to-video with audio.
Improved multi-character lip sync and dialogue pacing.
Cleaner and more stable motion.
More natural audio and stronger text-to-video consistency.

FLUX.2 [klein] 9B

MODEL

FLUX.2 [klein] 9B is the final model in the FLUX.2 klein family.

4-step distilled image generation and editing.
Built for sub-second inference.
Text-to-image and editing handled in a single model.
Optimized for interactive apps and latency-sensitive production use.

TwinFlow Z-Image Turbo

MODEL

TwinFlow Z-Image Turbo is an image generation model optimized for fast inference at 4 steps.

Supports text-to-image generation with high-quality results.
Low latency for rapid iteration workflows.
Available via API only.

Bria Fibo Edit

MODEL

Bria Fibo Edit is an image editing model built around structured control using image, mask, and JSON instructions.

Edits applied exactly where specified.
Supports targeted edits, generative fill, outpainting, and compositional changes.
Controlled, reproducible edits with full commercial safety through Bria's licensed training data.

Wan 2.6 Flash Image to Video

MODEL

Wan 2.6 Flash is a faster, lower-cost variant of Wan 2.6 for image-to-video workflows.

Designed for preview workflows and high-throughput creative pipelines.
Fast turnaround without sacrificing quality.

Kandinsky 5 Image

MODEL

Kandinsky 5 Image is an open-source model producing high-quality visuals from text.

Supports text-to-image and image-to-image workflows with reference images.
Optimized for clarity and strong text adherence.
Diverse visual concepts.
Available via API only.

PixVerse v5 Fast

MODEL

PixVerse v5 Fast delivers the same quality as PixVerse v5, optimized for speed.

Text-to-video and image-to-video support.
Ideal for rapid iteration and previews where turnaround matters.

Wan2.6 Image

MODEL

Wan2.6 Image offers strong prompt adherence with spatial structure preservation.

Guide style, subject, or background using reference images.
Supports text-to-image and image-to-image workflows.
Multiple reference images supported.

Bria FIBO Lite

MODEL

Bria FIBO Lite is a lightweight, efficient text-to-image model designed for rapid image generation.

Veo 3.1 4K and 9:16 Support

FEATURE

Veo 3.1 now supports new output formats for high-resolution production workflows.

Native 9:16 output for Instagram and TikTok-ready clips.
State-of-the-art upscaling to 1080p and 4K resolution.

Z-Image Turbo Image-to-Image

FEATURE

Z-Image Turbo now supports Image-to-Image workflows.

Upload an image, add your prompt, and get fast, high-quality transformations.

FLUX.2 [klein] Models

MODEL

Three new FLUX.2 [klein] models for fast, flexible image generation and editing.

ImagineArt 1.5 Pro

MODEL

ImagineArt 1.5 Pro is an evolution of ImagineArt 1.5 with enhanced hyper-realism.

Outputs native 4K images with sharper textures.
More realistic lighting.
Built for production-ready image generation.
Ideal for posters, layouts, marketing visuals, and design-sensitive formats.

Seedance 1.5 Pro 1080p Support

FEATURE

Seedance 1.5 Pro now supports 1080p video generation with synchronized audio.

Generate 1080p video with synchronized audio support.
Works directly from text or image inputs.
Precise audio-visual timing with strong motion coherence.
Expressive camera control.
Advanced narrative prompt handling for short-form video creation.

Creatify Aurora v1

MODEL

Creatify Aurora v1 turns a single image and audio input into a realistic avatar video.

Accurate lip sync with natural facial motion.
Expressive performance generation.
Suitable for advertising, music videos, virtual humans, dubbing, and localisation.

Qwen Image Models

MODEL

Three new Qwen image models for editing, generation, and layer decomposition.

Qwen Image Edit 2511

Preserves people and textures during edits.
Easily switch backgrounds while preserving key elements.

Qwen Image 2512

Major realism improvements with photorealistic humans.
Sharper natural details in landscapes, fur, and textures.
Accurate text rendering with better layout and composition.

Qwen Image Layered

Decompose images into editable RGBA layers.
Edit individual layers without affecting others.
Perfect for non-destructive editing workflows.

Wan2.2 Animate

MODEL

Wan2.2 Animate brings characters to life or swaps them entirely using reference-based motion.

Bring characters to life or swap them entirely.
Drive motion from a reference video.
Expressions and movement stay consistent.
Supports Animate Mode, Replace Mode, and Post Retargeting.

Kling VIDEO 2.6 Motion Control

MODEL

Kling VIDEO 2.6 with Motion Control handles complex and fast actions with reference-based character performance.

Built to handle complex and fast actions.
Drive a character's performance using a reference video.
Generate up to 30 seconds of one-take motion with synced expressions and lip sync.