Best for Anime

A curated set of models tuned for anime and manga aesthetics, including expressive faces, stylised linework, and bold colour. Great for character art, detailed scenes, and illustration-heavy creative outputs.

Launch model

Top Pick

Launch View details

Best rated

Seedance 2.0

by ByteDance

Seedance 2.0 is a unified multimodal audio-video generation model from ByteDance that accepts text, image, audio, and video inputs in combination, supporting up to 9 images, 3 video clips, and 3 audio clips as reference. It generates multi-shot videos up to 15 seconds with dual-channel synchronized audio including dialogue, ambient sound, and effects. It features physics-aware motion, improved controllability for video extension and editing, and strong instruction following for complex scene composition.

Featured Models

Top-performing models in this category, recommended by our community and performance benchmarks.

Launch View details

Seedance 2.0 Fast

by ByteDance

Seedance 2.0 Fast is a speed-optimized variant of ByteDance's unified multimodal audio-video generation model. It accepts text, image, audio, and video inputs in combination, like Seedance 2.0, but targets shorter wall-clock times and higher throughput for iterative workflows. It produces multi-shot videos with dual-channel synchronized audio including dialogue, ambient sound, and effects, with physics-aware motion and editing controls, while prioritizing responsiveness over the last increment of visual refinement so teams can preview and ship ideas faster.

Launch View details

Kling VIDEO 3.0 Pro

by Kling AI

Kling VIDEO 3.0 Pro is a unified multimodal video model that generates high-quality video with synchronized audio from text or images. It supports reference-guided generation, prompt-based editing, fine control over motion and pacing, and stable temporal coherence for cinematic and narrative clips. Native audio output includes dialogue, ambient sound, and effects aligned to the visuals.

Launch View details

Kling VIDEO 3.0 Standard

by Kling AI

Kling VIDEO 3.0 Standard generates synchronized video and audio from text and images with a balance of quality, speed, and cost. It supports reference-based generation and prompt-driven edits while maintaining temporal stability and clear motion. Native audio output includes dialogue and ambient sound that aligns with the visual content.

Launch View details

Veo 3.1

by Google

Veo 3.1 is a cinematic video generation model for developers. It turns text prompts or reference images into high fidelity scenes with richer native audio, better prompt adherence, and granular shot control. Use it for story driven clips with smoother motion and consistent style.

View details

Api Only

Veo 3.1 Lite

by Google

Veo 3.1 Lite is the most cost-effective model in the Veo 3.1 family, designed for high-volume applications requiring rapid iteration. It supports text-to-video and image-to-video generation at 720p or 1080p in landscape and portrait formats, with customizable duration of 4, 6, or 8 seconds. It maintains the same generation speed as Veo 3.1 Fast at less than 50% of the cost, and includes native synchronized audio generation.

Launch View details

Wan2.6

by Alibaba

Wan2.6 is a multimodal video model for text to video and image to video generation with support for multi-shot sequencing and native sound. It emphasizes temporal stability, consistent visual structure across shots, and reliable alignment between visuals and audio in short form video generation.

Launch View details

Wan2.7

by Alibaba

Wan2.7 is Alibaba's next-generation multimodal video model supporting text-to-video, image-to-video, reference-to-video, and video editing. It features multi-shot storytelling, subject-consistent multi-character generation, first-and-last-frame interpolation, video continuation, style transfer, instruction-based editing, and audio-conditioned generation with auto-dubbing. Output at 720p or 1080p, 30 FPS in multiple aspect ratios.

Launch View details

Seedream 5.0 Lite

by ByteDance

Seedream 5.0 Lite is an advanced image generation model from ByteDance that produces high-quality still images from text prompts while providing flexibility for editing workflows. It is designed to combine expressive creativity with precise control over layout, composition, styles, and details, interpreting nuanced instructions faithfully. Users can incorporate a single reference image to guide generation or editing. Integrated search and reasoning features let the model visualize real-time trends and domain information in the output.

#10

Launch View details

Ideogram 3.0

by Ideogram

Ideogram 3.0 is a text to image model for high fidelity design work. It improves text rendering, complex layout handling, and photorealism. It also adds stronger style controls and supports editing tasks like inpainting and background replacement for production workflows.

#11

Launch View details

GPT Image 2

by OpenAI

GPT Image 2 is a general-purpose GPT Image family model for text-to-image generation and image editing. Its strengths include strong prompt adherence, readable embedded text, detailed edits, photorealistic rendering, and structured visual outputs such as posters, packaging, product comps, diagrams, and other layout-sensitive images.

#12

Launch View details

GPT Image 1.5

by OpenAI

GPT Image 1.5 is OpenAI’s newest flagship image model powering the latest ChatGPT Images. It delivers significantly faster image generation with stronger instruction following, more precise edits that preserve original details, more believable transformations, and improved rendering of dense or small text. It is suited for practical creative workflows, detailed design tasks, and production use cases.

#13

Launch View details

FLUX.2 [pro]

by Black Forest Labs

FLUX.2 [pro] is a flow-matching latent transformer for precise text-to-image synthesis and reference-guided editing. It supports multi image references, 4MP outputs, and Mistral-based text conditioning for controllable composition and robust iterative edits that preserve structure.

#14

Launch View details

FLUX.2 [max]

by Black Forest Labs

FLUX.2 [max] is a high-precision text to image and image editing model from Black Forest Labs that generates visuals grounded in real-time information via live web search. It delivers maximum prompt adherence with multi-reference editing and state-of-the-art consistency across identities, objects, and details.

#15

View details

Api Only

Qwen-Image-2512

by Alibaba

Qwen-Image-2512 is an improved version of the Qwen-Image image foundation model with enhanced prompt understanding, superior text rendering accuracy, and more realistic visual details. It generates high-fidelity images from text prompts across diverse subjects and styles.

Best for Anime

Seedance 2.0

Featured Models

Seedance 2.0 Fast

Kling VIDEO 3.0 Pro

Kling VIDEO 3.0 Standard

Veo 3.1

Veo 3.1 Lite

Wan2.6

Wan2.7

Seedream 5.0 Lite

Ideogram 3.0

GPT Image 2

GPT Image 1.5

FLUX.2 [pro]

FLUX.2 [max]

Qwen-Image-2512

Explore other collections