Gemini Omni
Coming soonGoogle's multimodal video model for generation and conversational editing
This model isn't available yet. Talk to our sales team about volume discounts, or follow us on X to see when it lands.
All workflows supported natively.
One integration for generation and editing, with no separate endpoints.
Why Gemini Omni stands out
Grounded in real-world knowledge
An intuitive grasp of physics, plus Gemini's knowledge of history and science as well as culture, for believable scenes.
Location & world awareness
Understands real places and generates regionally accurate settings, down to local signage and language.
Character consistency across scenes
One reference image keeps a character or product consistent across scenes, angles, and lighting.
Character & object swaps
Replace characters or objects just by asking. Motion and dialogue carry over seamlessly.
Motion & style transfer
Apply motion from a source clip to a reference character, or restyle footage without regenerating the scene.
Synced text and audio
On-screen text and sound that connect coherently to what's happening in the frame.
Built for real workflows
Access Gemini Omni, Google's multimodal model that creates and edits video from text, image, video, and audio inputs, grounded in Gemini's real-world knowledge. Iterate through natural conversation, with every edit building on the last while keeping your scene consistent. It's built for real production work, not just isolated clips.
Storyboard to finished video
Upload a storyboard image and get a full video with dialogue and narrative.
Character-led storytelling
Turn a single character image into a multi-scene story with narration and captions.
Background & setting swaps
Drop a subject into a new environment, like a Bollywood dance scene.
Location-aware generation
Generate regionally correct outputs from real places, coordinates, signage, and native language.
Natural-language editing
Change one segment with a follow-up instruction, like swapping Sydney for the Taj Mahal.
Production at scale
Turn a few assets into many production-grade outputs fast.
How it compares
Where Gemini Omni sits next to other leading video models. Its edge is conversational, multi-turn editing on top of generation, grounded in Gemini's real-world knowledge.
| Model | Gemini Omni | Veo 3.1 | Kling VIDEO O3 | Seedance 2.0 | HappyHorse 1.0 |
|---|---|---|---|---|---|
| Input types | Text, image, audio, video | Text, image | Text, image, audio | Text, image, audio, video | Text, image |
| Native audio | Yes | Yes | Yes | Yes | No |
| Max resolution | 1080p (4K reportedly coming with Pro) | 4K | 4K | 1080p | 1080p |
| Max duration | 10s | 8s | 15s | 15s | 15s |
| Aspect ratios | 16:9, 9:16 | 16:9, 9:16 | 16:9, 9:16, 1:1 | 16:9, 9:16, 1:1, 4:3, 3:4, 21:9 | 16:9, 9:16, 1:1, ~4:3, ~3:4 |
| Best fit | Iterative, reference-based editing across mixed inputs | Cinematic generation, high-resolution output | Long-form, character-driven content in 4K | Audio-visual sync, multi-reference composition | High-quality no-audio generation |
| Capabilities | Conversational multi-turn editing, real-world knowledge & physics, character & scene consistency, generation & editing in one model | Real-world knowledge & physics, character & scene consistency; limited multi-turn editing and unified generation & editing | Character & scene consistency; limited multi-turn editing, real-world knowledge, and unified generation & editing | Character & scene consistency; limited real-world knowledge and unified generation & editing | Limited real-world knowledge and character consistency |
Specs and capabilities are indicative and evolve as each model updates.
Frequently asked questions
What is Gemini Omni?
Google's multimodal model that generates and edits video from text, image, video, and audio inputs, grounded in Gemini's real-world knowledge. In Google's own shorthand: “Think of Gemini Omni like Nano Banana, but for video.” It's a conversational, iterative editing workflow, not just a generation tool.
What makes Gemini Omni different?
Conversational, multi-turn editing where each change builds on the last while keeping the scene consistent, plus an intuitive grasp of real-world physics, drawing on Gemini's knowledge of history and science as well as culture.
What workflows does it support?
Text-to-video, image-to-video, reference-to-video, and multi-turn video editing, all through one Runware integration.
How does Gemini Omni compare to other models?
Most video models focus on single-shot generation. Gemini Omni adds conversational, iterative editing on top of generation, so you can refine a scene over multiple turns instead of regenerating from scratch, while keeping characters and continuity consistent. On Runware it sits alongside other leading video models, so you can pick the right one for each task through the same API.
How much does Gemini Omni cost?
Gemini Omni isn't available on the platform yet, so pricing hasn't been published. Once it launches, rates will be listed on the Runware pricing page alongside our other video models. Contact us to be notified when it goes live.
Do I need separate integrations?
No. Generation and editing run through the same API.
When will Gemini Omni be released?
Gemini Omni is coming soon and isn't generally available yet. Request access through the form above, and we'll let you know as soon as you can start using it.
Talk to us about volume discounts
Have questions about Gemini Omni? Chat to our team about enterprise usage, including volume discounts and dedicated RPM, and we'll follow up shortly.