Gemini Omni

Coming soon

Google's multimodal video model for generation and conversational editing

This model isn't available yet. Talk to our sales team about volume discounts, or follow us on X to see when it lands.

0:00

All workflows supported natively.

Text-to-videoImage-to-videoReference-to-videoMulti-turn video editing

One integration for generation and editing, with no separate endpoints.

Why Gemini Omni stands out

Grounded in real-world knowledge

An intuitive grasp of physics, plus Gemini's knowledge of history and science as well as culture, for believable scenes.

Location & world awareness

Understands real places and generates regionally accurate settings, down to local signage and language.

Character consistency across scenes

One reference image keeps a character or product consistent across scenes, angles, and lighting.

Character & object swaps

Replace characters or objects just by asking. Motion and dialogue carry over seamlessly.

Motion & style transfer

Apply motion from a source clip to a reference character, or restyle footage without regenerating the scene.

Synced text and audio

On-screen text and sound that connect coherently to what's happening in the frame.

Built for real workflows

Access Gemini Omni, Google's multimodal model that creates and edits video from text, image, video, and audio inputs, grounded in Gemini's real-world knowledge. Iterate through natural conversation, with every edit building on the last while keeping your scene consistent. It's built for real production work, not just isolated clips.

Storyboard to finished video

Upload a storyboard image and get a full video with dialogue and narrative.

Character-led storytelling

Turn a single character image into a multi-scene story with narration and captions.

Background & setting swaps

Drop a subject into a new environment, like a Bollywood dance scene.

Location-aware generation

Generate regionally correct outputs from real places, coordinates, signage, and native language.

Natural-language editing

Change one segment with a follow-up instruction, like swapping Sydney for the Taj Mahal.

Production at scale

Turn a few assets into many production-grade outputs fast.

How it compares

Where Gemini Omni sits next to other leading video models. Its edge is conversational, multi-turn editing on top of generation, grounded in Gemini's real-world knowledge.

ModelGemini OmniVeo 3.1Kling VIDEO O3Seedance 2.0HappyHorse 1.0
Input typesText, image, audio, videoText, imageText, image, audioText, image, audio, videoText, image
Native audioYesYesYesYesNo
Max resolution1080p (4K reportedly coming with Pro)4K4K1080p1080p
Max duration10s8s15s15s15s
Aspect ratios16:9, 9:1616:9, 9:1616:9, 9:16, 1:116:9, 9:16, 1:1, 4:3, 3:4, 21:916:9, 9:16, 1:1, ~4:3, ~3:4
Best fitIterative, reference-based editing across mixed inputsCinematic generation, high-resolution outputLong-form, character-driven content in 4KAudio-visual sync, multi-reference compositionHigh-quality no-audio generation
CapabilitiesConversational multi-turn editing, real-world knowledge & physics, character & scene consistency, generation & editing in one modelReal-world knowledge & physics, character & scene consistency; limited multi-turn editing and unified generation & editingCharacter & scene consistency; limited multi-turn editing, real-world knowledge, and unified generation & editingCharacter & scene consistency; limited real-world knowledge and unified generation & editingLimited real-world knowledge and character consistency

Specs and capabilities are indicative and evolve as each model updates.

Frequently asked questions

What is Gemini Omni?

Google's multimodal model that generates and edits video from text, image, video, and audio inputs, grounded in Gemini's real-world knowledge. In Google's own shorthand: “Think of Gemini Omni like Nano Banana, but for video.” It's a conversational, iterative editing workflow, not just a generation tool.

What makes Gemini Omni different?

Conversational, multi-turn editing where each change builds on the last while keeping the scene consistent, plus an intuitive grasp of real-world physics, drawing on Gemini's knowledge of history and science as well as culture.

What workflows does it support?

Text-to-video, image-to-video, reference-to-video, and multi-turn video editing, all through one Runware integration.

How does Gemini Omni compare to other models?

Most video models focus on single-shot generation. Gemini Omni adds conversational, iterative editing on top of generation, so you can refine a scene over multiple turns instead of regenerating from scratch, while keeping characters and continuity consistent. On Runware it sits alongside other leading video models, so you can pick the right one for each task through the same API.

How much does Gemini Omni cost?

Gemini Omni isn't available on the platform yet, so pricing hasn't been published. Once it launches, rates will be listed on the Runware pricing page alongside our other video models. Contact us to be notified when it goes live.

Do I need separate integrations?

No. Generation and editing run through the same API.

When will Gemini Omni be released?

Gemini Omni is coming soon and isn't generally available yet. Request access through the form above, and we'll let you know as soon as you can start using it.

Talk to us about volume discounts

Have questions about Gemini Omni? Chat to our team about enterprise usage, including volume discounts and dedicated RPM, and we'll follow up shortly.