Vidu Q3 Turbo

Low-latency multimodal video generation with native audio

Vidu Q3 Turbo

Vidu Q3 Turbo is a speed-optimized multimodal video generation model that produces short video clips with synchronized audio directly from text or images. It prioritizes fast inference and responsive iteration while preserving stable motion, coherent composition, and reliable audio alignment, making it suitable for rapid prototyping and production workflows where latency is critical.

Vidu
Commercial use
Text to VideoImage to Video
Pricing starts at $0.026/s at 540p, $0.039/s at 720p, and $0.052/s at 1080p.

Average savings vs typical market rates

540p · 5sSave ~35%$0.13
720p · 5sSave ~35%$0.195
1080p · 5sSave ~35%$0.26

More models from this creator

Vidu Q3 is a multimodal video generation model that creates video with synchronized audio directly from text or images, supports intelligent multi-shot sequencing, and produces complete outputs with stable visuals and embedded subtitles without post-processing.

Vidu Q2 Turbo is the fast tier of the Q2 video model. It targets rapid iteration for creative pipelines. It keeps the cinematic look of Vidu Q2 Pro. It adds shorter latency, stronger large motion control, and smoother camera movement for prompt driven video shots.

Vidu Q2 Pro is a high fidelity video generation model for cinematic storytelling. It supports text prompts, image inputs, and multi reference control for long form scenes. It targets developers who need controllable motion, stable characters, and smooth camera work for complex shots.

Vidu Q1 Classic generates 1080p clips up to 16 seconds from text prompts, source images, or reference shots. It targets controllable motion and stable scenes for fast prototyping. Ideal for teams that need cinematic tests without complex video pipelines.

Vidu Q1 is a generative video model that preserves visual fidelity from multiple reference images. It supports character, scene and prop control with smooth transitions and 1080p clips. Ideal for ads, story sequences and animation workflows that need tight visual continuity.

Vidu Q1 (image) is a reference-to-image model designed for high visual fidelity. It blends multiple input images with consistent identity and style. Prompts can guide composition and layout without losing coherence. The model supports flexible aspect ratios for ads, social content, storyboards or animation assets. It produces clean visuals with minimal effort and is useful for rapid creative workflows.

Vidu 2.0 is a generative video model for rapid 1080p clip creation. It targets 4 second and 8 second shots with strong subject consistency and support for batch workflows. Developers can drive cinematic clips from text prompts and templates with improved speed and lower cost.

Vidu 2.0 Template lets developers define reusable templates that drive text to video generation. Configure social ready scenes with fixed structure. Control choreography, camera motion, and visual style presets through simple parameters for fast repeatable content.

Vidu 1.5 is a multimodal text to video model that focuses on multi entity consistency across complex scenes. It keeps multiple characters and objects visually stable across frames and shots. Developers can build long form video workflows that need coherent motion and style control.