// platform updates

Changelog

The latest updates and improvements to the Runware platform.

ImagineArt 2.0

MODEL

ImagineArt 2.0 is a reasoning-based text-to-image model for instruction-faithful generation across realistic and stylized outputs.

  • Uses reasoning-based generation for more accurate and nuanced prompt following.
  • Supports a wide range of styles, from photorealistic imagery to illustration and anime.
  • New color pipeline improves vibrancy and true-to-life color rendering.

MiniMax Music 2.6 & Music Cover

MODEL

Two MiniMax music models are now available for song generation and melody-preserving audio transformation.

MiniMax Music 2.6

  • Generates full songs and instrumentals from short prompts or detailed production-style instructions.
  • Supports prompt-level control over key, emotional arc, and song structure, including sections such as intro, verse, chorus, and bridge.
  • Returns the first audio clip in under 20 seconds for faster iteration.

MiniMax Music Cover

  • Reworks an uploaded track in a new style while preserving the original melody.
  • Supports prompt-controlled changes to genre, instrumentation, arrangement, and vocal character.

PixVerse v6

MODEL

PixVerse v6 is now available for end-to-end video generation with native audio output.

  • Generate video with dialogue, sound effects, and music in a single pass.
  • Up to 15 seconds at up to 1080p.
  • Improved consistency for characters and smoother transitions across shots, with directed camera motion.
  • Currently ranked #4 for Image-to-Video on Artificial Analysis.

sync-3

MODEL

sync-3 is a lip sync model for full-shot generation with native 4K and super resolution support.

  • Processes the entire shot in one pass instead of chunking, improving consistency across frames.
  • Handles close-ups, extreme face angles, partially obscured faces, and obstructed mouths.
  • Preserves voice, tone, and emotional expression across 95+ languages.

GPT-5.4, 5.4 Mini & 5.4 Nano

MODEL

Three new OpenAI GPT-5.4 multimodal models for advanced reasoning, coding workflows, and high-volume automation.

GPT-5.4

  • Flagship reasoning model with a 1 million token context window.
  • Integrates native computer use and coding capabilities with configurable reasoning effort for complex tasks.
  • Improved factual accuracy suitable for production-grade writing, analysis, and execution.

GPT-5.4 Mini

  • Efficient, lower-cost variant built for coding assistants and subagent orchestration.
  • Supports a 400K token context window with native computer use and faster responsiveness.
  • Balances high-quality text and image understanding for mid-scale workloads.

GPT-5.4 Nano

  • Ultra-low-latency, lightweight model for high-throughput classification, data extraction, and ranking.
  • Optimized for maximum speed and cost efficiency in high-volume, continuous workflows.

ACE-Step v1.5

MODEL

ACE-Step v1.5 is an open-source music generation model with voice cloning, lyric editing, and multilingual support.

  • Generates full tracks from text prompts with support for voice cloning, remixing, and cover generation.
  • Text-to-audio and audio-to-audio workflows across a wide range of genres.
  • Compositions up to 10 minutes with support for over 50 languages.
  • Turbo variant available for faster generation.

Exactly.ai Model Family

MODEL

16 new Exactly.ai models for style-consistent image generation across illustration and photography.

  • 10 illustration styles and 6 photographic styles, each trained with a specific artist for consistent output across generations.
  • Outputs stay visually coherent without drifting between styles across batches.
  • Suited for creative agencies, marketing teams, and content platforms generating images at scale with a consistent visual identity.

Wan2.7 Video

MODEL

Wan2.7 is Alibaba's most capable video model, supporting text-to-video, image-to-video, reference-to-video, and instruction-based video editing.

  • Instruction-based video editing to add, remove, or swap elements, change backgrounds, and restyle scenes from text.
  • Reuse motion sequences, camera movements, and visual styles from reference videos.
  • Lock character appearance and voice across shots with up to 5 subject references and 40+ facial expressions.
  • Output at 720p or 1080p, 30fps with native audio and auto-dubbing support.

P-Image Upscale

MODEL

P-Image Upscale is an AI-powered image upscaling model that enhances resolution while preserving detail and realism.

  • Upscale images to a target resolution between 1 and 8 megapixels.
  • Optional detail enhancement to sharpen textures and fine elements.
  • Realism enhancement to improve photographic quality and depth.
  • Sub-second inference for fast, scalable upscaling workflows.

Veo 3.1 Lite

MODEL

Veo 3.1 Lite is the most cost-effective model in the Veo family, matching Veo 3.1 Fast speed at roughly half the price.

  • Text-to-video, image-to-video, and reference-to-video with 4s, 6s, and 8s output durations.
  • Native audio generation with dialogue, sound effects, and ambient audio synchronized with video.
  • Supports 16:9 and 9:16 aspect ratios at up to 1080p resolution.

Wan2.7 Image

MODEL

Wan2.7 Image is a unified image generation and editing model with improved human figures, text rendering, and color output.

  • Sharper human figure generation with more realistic proportions and detail.
  • Improved multilingual text rendering supporting up to 3,000 tokens in 12 languages.
  • More vibrant, saturated color output compared to previous Wan versions.
  • Supports text-to-image, image-to-image, and region-based editing in a single model.
  • Pro variant available for higher-fidelity output.

Topaz Labs Starlight Precise 2.5

MODEL

Topaz Labs Starlight Precise 2.5 is a diffusion-based video upscaling model with high-fidelity detail preservation.

  • Upscale video to 1080p or 4K with fine texture and edge retention other models soften.
  • Full temporal consistency across frames with natural skin textures and faces.
  • Suited for film restoration, content upscaling pipelines, and production-grade video enhancement.

FLUX.2 [klein] 9B KV

MODEL

FLUX.2 [klein] 9B KV is a KV-cache optimized variant of Klein 9B with up to 2.5x faster multi-reference editing.

  • All capabilities of FLUX.2 [klein] 9B: sub-second generation, text-to-image, and multi-reference editing in a single model.
  • KV-cache support delivers up to 2.5x faster inference for multi-reference editing workflows.
  • Built for interactive apps and real-time editing pipelines.

xAI Text-to-Speech

MODEL

xAI Text-to-Speech is an expressive, low-latency text-to-speech model with multilingual support.

  • Natural speech synthesis with inline controls for pauses, laughter, whispers, and emphasis.
  • 5 distinct voices (Ara, Eve, Leo, Rex, Sal) across 20+ languages with automatic language detection.
  • Low-latency synthesis suited for voice agents, real-time applications, and multilingual content at scale.

HeyGen Video Agent & Avatar IV

MODEL

Two HeyGen models for end-to-end AI video creation and photorealistic avatar generation.

HeyGen Video Agent

  • Prompt-to-video pipeline with automated scripting, voiceover, avatar selection, B-roll, and editing.
  • Designed for marketing, sales, training, and social content at scale.

HeyGen Avatar IV

  • Photorealistic avatars with micro-expressions, natural gestures, and audio-driven emotion.
  • Granular expressiveness controls with faster generation times.
  • Suited for enterprise training, multilingual content, and scalable avatar-based video.

Seedance 1.0 Lite & Seedream 3.0 Deprecated

DEPRECATION

Bytedance is deprecating Seedance 1.0 Lite and Seedream 3.0. These models will be fully deactivated on May 13, 2026, with resources being gradually reduced starting from March 23rd.

If you are using these models, please migrate to the recommended replacements before the deactivation date.

Suggested Replacements

TRELLIS.2 & SAM 3D Objects

MODEL

Two new 3D generation models for creating production-ready assets from single images.

TRELLIS.2

  • High-resolution 3D asset generation from a single 2D image input.
  • Produces fully textured, production-ready meshes.
  • Ideal for game assets, VFX, product design, and architectural visualization.

SAM 3D Objects

  • Reconstructs full 3D geometry, realistic textures, and scene layout from a single RGB image.
  • High-fidelity mesh output even in cluttered, occluded real-world scenes.
  • Suited for AR/VR asset creation, e-commerce product visualization, and scene understanding.

Gemini 3.1 Pro, 3.1 Flash Lite & 3 Flash

MODEL

Three Gemini LLM models from Google are now available on Runware.

Gemini 3.1 Pro

  • Google's most capable reasoning model with 77.1% on ARC-AGI-2 and 94.3% on GPQA Diamond.
  • Natively multimodal with text, image, audio, video, and code inputs.
  • Up to 2 million token context window for processing entire codebases, books, or hours of video.
  • Configurable thinking levels for balancing reasoning depth against cost and speed.
  • Enhanced agentic workflows for autonomous development and long-horizon tasks.

Gemini 3.1 Flash Lite

  • Optimized for high-volume, low-latency workloads at a fraction of the cost of Pro.
  • 2.5x faster time-to-first-token and over 360 tokens per second output speed.
  • Up to 1 million token context window with multimodal input support.
  • Configurable thinking levels for fine-grained control over reasoning depth.

Gemini 3 Flash

  • Balanced speed and capability with Gemini 3 Pro-level reasoning at Flash efficiency.
  • 200K token context window with text, image, audio, video, and code inputs.
  • 78% on SWE-bench Verified for strong coding performance across large codebases.

MiniMax M2.7 & M2.5 LLM

MODEL

MiniMax M2.7 and M2.5 are the first LLM models available on Runware, built for complex agentic workflows and real-world productivity.

MiniMax M2.7

  • Near Opus-level software engineering with 56.22% on SWE-Pro.
  • 97% skill adherence rate across 40+ complex, long-horizon tasks.
  • Self-evolving architecture with 30% performance gains across 100+ reinforcement learning iterations.
  • Full end-to-end project delivery including debugging, code security, and refactoring.
  • Highspeed variant available for lower latency.

MiniMax M2.5

  • 80.2% on SWE-Bench Verified with strong agentic performance beyond code generation.
  • 200K context window with built-in thinking mode.
  • Optimized for speed and cost across multi-step tool use tasks.

Grok Imagine Image Pro

MODEL

Grok Imagine Image Pro is xAI's most capable image generation model with higher quality, greater prompt adherence, and finer detail.

  • Photorealistic and stylized image generation from text prompts.
  • Native 2K resolution output for print, product visuals, and professional creative pipelines.
  • Multi-turn iterative refinement with natural language instructions.

Inworld TTS-1.5

MODEL

Inworld TTS 1.5 is a low-latency text-to-speech model available in Mini and Max variants, built for real-time AI applications at scale.

  • Production-grade latency with P90 time-to-first-audio under 250ms (Max) and under 130ms (Mini).
  • 15+ languages supported.
  • Ideal for conversational AI agents, live translation, interactive media, and accessibility tools.

Kling VIDEO 3.0 Motion Control

FEATURE

Kling VIDEO 3.0 now supports Motion Control with facial identity preservation and emotion reproduction.

  • Consistent facial identity preserved across angles and through occlusions.
  • Complex emotion reproduction with high-fidelity restoration.

LTX-2.3

MODEL

LTX-2.3 is a video generation model with a rebuilt VAE, stronger prompt handling, and improved audio-video sync.

  • Rebuilt VAE for improved textures, edge detail, and on-screen text clarity.
  • 4× larger text connector for complex multi-subject prompts with reduced drift.
  • Frame rate options at 24, 25, 48, and 50fps.
  • Significantly reduced audio artifacts with tighter audio-video sync.
  • Fast variant available for lower-latency generation.

Qwen-Image-2.0 and Qwen-Image-2.0-Pro

MODEL

Qwen-Image-2.0 and Qwen-Image-2.0-Pro are unified text-to-image and image editing models with native 2K resolution.

  • Unified generation and editing in a single model.
  • Native 2K resolution with fine detail in skin, fabric, and architecture.
  • Professional typography rendering supporting up to 1,000-token prompts for infographics, posters, slides, and comics.

Kling IMAGE 3.0 and IMAGE O3

MODEL

Two new Kling image models for professional-grade generation and high-fidelity reference-based workflows.

Kling IMAGE 3.0

  • Native 2K resolution with enhanced texture, lighting, and material handling.
  • Image-to-image editing with consistency across iterative subject and layout refinements.

Kling IMAGE O3

  • High-fidelity generation at up to 4K resolution.
  • Supports up to 10 reference images.
  • Face and identity preservation for character consistency.

MiniMax Speech 2.8

MODEL

MiniMax Speech 2.8 is a text-to-speech model with consistent tone across long-form scripts.

  • Consistent voice quality across extended scripts without tonal drift.
  • Low latency suitable for real-time applications and voice agents.
  • Turbo variant available for faster, lower-cost generation.

Nano Banana 2

MODEL

Nano Banana 2 delivers improved reasoning, rendering, and layout control for text-to-image generation.

  • Stronger prompt adherence with real-world scene reasoning and stable multi-subject composition.
  • State-of-the-art text rendering for posters, UI, and packaging.
  • Improved material and lighting realism.
  • Up to 4K output.

P-Video

MODEL

P-Video is an image-to-video model optimized for fast iteration with a draft-to-final rendering workflow.

  • Draft mode generates preview clips at lower cost, with seed reuse for full-quality final renders.
  • Strong input-image consistency with reliable lip sync and native dialogue.
  • High-quality subject and background rendering, including low-resolution asset animation.
  • Up to 1080p resolution at 48fps.

Bria Fibo Edit Tools

MODEL

Bria FIBO Edit Tools is a structured image editing model with optional mask-based targeting for localized changes.

  • Preserves original content outside the edit region.
  • Supports recoloring, relighting, restoration, image blending, seasonal effects, and sketch generation.

Seedream 5.0 Lite

MODEL

Seedream 5.0 Lite is a text-to-image and image-to-image model with native 4K output and integrated web search.

  • High-detail image generation up to 4K resolution.
  • Strong text rendering for posters, UI mockups, and branded assets.
  • Consistent multi-image outputs with stable character and layout coherence across batches.
  • Integrates live web search to reflect current events and trending topics in generations.

Kling VIDEO O3

MODEL

Kling VIDEO O3 is now available on the platform in both Standard and Pro variants.

  • Unified text-to-video, image-to-video, and video-to-video workflows.
  • Generates HD video with synchronised native audio.
  • Supports reference-driven generation for stronger subject control.
  • Includes prompt-based video editing with strong temporal consistency to maintain subject stability across clips.

Recraft V4 and V4 Pro

MODEL

Recraft V4 and V4 Pro are professional-grade text-to-image models built for brand and campaign work.

  • Photorealistic and illustration output with structured composition.
  • Native 2K generation with style and color control.

Kling VIDEO 3.0

MODEL

Kling VIDEO 3.0 is available in Standard and Pro variants for cinematic multi-shot video generation with native audio.

  • Text-to-video and image-to-video support.
  • Multi-shot generation with manual or automatic scene planning.
  • Cross-shot consistency for characters and voices across scene cuts.
  • Multi-language audio with lip sync.
  • Up to 15 seconds with professional-grade resolution and reduced artifacts.
  • Native text rendering for signage, captions, and advertising.
  • Audio available as an optional add-on for both variants.

P-Image LoRA and P-Image-Edit LoRA

FEATURE

LoRA support is now available for P-Image and P-Image-Edit.

  • Apply trained or community LoRAs for custom styles, characters, and concepts.
  • Compatible with any HuggingFace LoRA without additional setup.
  • P-Image-Edit LoRA supports style transfers and targeted image transformations.

Wan2.6 Flash Reference-to-Video

MODEL

Wan2.6 Flash R2V generates video from short reference clips while preserving visual identity and voice.

  • Automatic lip sync and voice replication from the reference clip.
  • Available at 720p and 1080p with optional audio output.
  • Flash variant optimized for faster inference.

GLM-Image

MODEL

GLM-Image is an open-source 16B parameter hybrid autoregressive and diffusion model.

  • Unified text-to-image and image editing in a single model.
  • Industry-leading text rendering accuracy with strong prompt adherence across photorealistic and stylistic outputs.

Z-Image

MODEL

Z-Image is an open-source 6B parameter diffusion transformer for text-to-image and image-to-image generation.

  • High visual fidelity with strong prompt adherence across diverse styles.
  • Foundation model for distilled variants including Z-Image Turbo.

Vidu Q3 Turbo

MODEL

Vidu Q3 Turbo is a faster variant of Vidu Q3 optimized for rapid iteration.

  • Up to 16 seconds at native 1080p with first and last frame support.
  • Native audio-video sync with dialogue, ambient sounds, and BGM generated in a single pass.
  • Available at 540p, 720p, and 1080p output.

Grok Imagine Image & Video

MODEL

Two new Grok Imagine models for photorealistic image generation and short-form video.

Grok Imagine Image

  • Photorealistic image generation with text-to-image and image-to-image support.

Grok Imagine Video

  • Short-form text-to-video and image-to-video generation.
  • Dynamic motion and camera behavior with native audio output.
  • Output available at 480p and 720p.

Runway Gen-4.5

MODEL

Runway Gen-4.5 delivers unprecedented physical accuracy and expressive character generation.

  • Realistic weight, momentum, and fluid dynamics.
  • Expressive characters with nuanced emotions and natural gestures.
  • Strong prompt adherence for complex camera choreography and scene composition.
  • Text-to-video and image-to-video supported.

LTX-2 19B

MODEL

LTX-2 19B is a production-ready model with synchronized audio generation.

  • Native 4K resolution at 50fps.
  • Up to 20 seconds duration.
  • Prompt enhancer enabled by default to improve video quality.

Reve Image

MODEL

Reve Image is an all-in-one image model for generating, editing, and remixing visuals.

  • Supports style variation and layout changes.
  • Handles semantic edits without rebuilding from scratch.
  • Designed for iterative visual workflows.

Riverflow 2.0

MODEL

Riverflow 2.0 offers reference-based super resolution and custom font control.

  • Reference-based super resolution with high accuracy for product and marketing images.
  • Custom font control with reliable text rendering.
  • Consistent outputs with images checked and corrected before return.

Sora 2 Remix Capability

FEATURE

Remix capability allows you to refine and modify existing videos without regenerating from scratch.

  • Preserves original structure and camera movement for precise control.
  • Suitable for color grading, style transformation, and element replacement.

Vidu Q3

MODEL

Vidu Q3 generates synchronized video and audio from text or images.

  • Sound effects, dialogue, and subtitles in a single output — no stitching required.
  • Multi-shot support with automatic shot planning or prompt-based control.
  • Up to 16 seconds at 1080p with stable, vivid results.
  • Works from text prompts or image inputs.

Seedance 1.5 Pro Draft Mode

FEATURE

Draft mode allows you to validate prompt intent and motion logic before full video generation.

  • Save up to 40% compared to normal 480p generation.
  • Fix prompts before compute-heavy rendering.
  • Iterate faster and cheaper.

PixVerse v5.6

MODEL

PixVerse v5.6 delivers more reliable single-shot video generation with improved audio.

  • More reliable single-shot image-to-video and text-to-video with audio.
  • Improved multi-character lip sync and dialogue pacing.
  • Cleaner and more stable motion.
  • More natural audio and stronger text-to-video consistency.

FLUX.2 [klein] 9B

MODEL

FLUX.2 [klein] 9B is the final model in the FLUX.2 klein family.

  • 4-step distilled image generation and editing.
  • Built for sub-second inference.
  • Text-to-image and editing handled in a single model.
  • Optimized for interactive apps and latency-sensitive production use.

TwinFlow Z-Image Turbo

MODEL

TwinFlow Z-Image Turbo is an image generation model optimized for fast inference at 4 steps.

  • Supports text-to-image generation with high-quality results.
  • Low latency for rapid iteration workflows.
  • Available via API only.

Bria Fibo Edit

MODEL

Bria Fibo Edit is an image editing model built around structured control using image, mask, and JSON instructions.

  • Edits applied exactly where specified.
  • Supports targeted edits, generative fill, outpainting, and compositional changes.
  • Controlled, reproducible edits with full commercial safety through Bria's licensed training data.

Wan 2.6 Flash Image to Video

MODEL

Wan 2.6 Flash is a faster, lower-cost variant of Wan 2.6 for image-to-video workflows.

  • Designed for preview workflows and high-throughput creative pipelines.
  • Fast turnaround without sacrificing quality.

Kandinsky 5 Image

MODEL

Kandinsky 5 Image is an open-source model producing high-quality visuals from text.

  • Supports text-to-image and image-to-image workflows with reference images.
  • Optimized for clarity and strong text adherence.
  • Diverse visual concepts.
  • Available via API only.

PixVerse v5 Fast

MODEL

PixVerse v5 Fast delivers the same quality as PixVerse v5, optimized for speed.

  • Text-to-video and image-to-video support.
  • Ideal for rapid iteration and previews where turnaround matters.

Wan2.6 Image

MODEL

Wan2.6 Image offers strong prompt adherence with spatial structure preservation.

  • Guide style, subject, or background using reference images.
  • Supports text-to-image and image-to-image workflows.
  • Multiple reference images supported.

Bria FIBO Lite

MODEL

Bria FIBO Lite is a lightweight, efficient text-to-image model designed for rapid image generation.

Veo 3.1 4K and 9:16 Support

FEATURE

Veo 3.1 now supports new output formats for high-resolution production workflows.

  • Native 9:16 output for Instagram and TikTok-ready clips.
  • State-of-the-art upscaling to 1080p and 4K resolution.

Z-Image Turbo Image-to-Image

FEATURE

Z-Image Turbo now supports Image-to-Image workflows.

  • Upload an image, add your prompt, and get fast, high-quality transformations.

FLUX.2 [klein] Models

MODEL

Three new FLUX.2 [klein] models for fast, flexible image generation and editing.

4B Base

Compact, undistilled, SOTA quality for its size. Optimized for fine-tuning and custom pipelines.

4B

Distilled for near real-time generation and editing. Ideal for production deployments and latency-critical workflows.

9B Base

Maximum flexibility and control for advanced fine-tuning and post-training workflows.

ImagineArt 1.5 Pro

MODEL

ImagineArt 1.5 Pro is an evolution of ImagineArt 1.5 with enhanced hyper-realism.

  • Outputs native 4K images with sharper textures.
  • More realistic lighting.
  • Built for production-ready image generation.
  • Ideal for posters, layouts, marketing visuals, and design-sensitive formats.

Seedance 1.5 Pro 1080p Support

FEATURE

Seedance 1.5 Pro now supports 1080p video generation with synchronized audio.

  • Generate 1080p video with synchronized audio support.
  • Works directly from text or image inputs.
  • Precise audio-visual timing with strong motion coherence.
  • Expressive camera control.
  • Advanced narrative prompt handling for short-form video creation.

Creatify Aurora v1

MODEL

Creatify Aurora v1 turns a single image and audio input into a realistic avatar video.

  • Accurate lip sync with natural facial motion.
  • Expressive performance generation.
  • Suitable for advertising, music videos, virtual humans, dubbing, and localisation.

Qwen Image Models

MODEL

Three new Qwen image models for editing, generation, and layer decomposition.

Qwen Image Edit 2511

  • Preserves people and textures during edits.
  • Easily switch backgrounds while preserving key elements.

Qwen Image 2512

  • Major realism improvements with photorealistic humans.
  • Sharper natural details in landscapes, fur, and textures.
  • Accurate text rendering with better layout and composition.

Qwen Image Layered

  • Decompose images into editable RGBA layers.
  • Edit individual layers without affecting others.
  • Perfect for non-destructive editing workflows.

Wan2.2 Animate

MODEL

Wan2.2 Animate brings characters to life or swaps them entirely using reference-based motion.

  • Bring characters to life or swap them entirely.
  • Drive motion from a reference video.
  • Expressions and movement stay consistent.
  • Supports Animate Mode, Replace Mode, and Post Retargeting.

Kling VIDEO 2.6 Motion Control

MODEL

Kling VIDEO 2.6 with Motion Control handles complex and fast actions with reference-based character performance.

  • Built to handle complex and fast actions.
  • Drive a character's performance using a reference video.
  • Generate up to 30 seconds of one-take motion with synced expressions and lip sync.