Gemini Omni Flash

Live now on Runware

Google's multimodal video model for generation and conversational editing

Access Gemini Omni Flash, Google's latest video generation model.

Try in Playground

0:00

All workflows supported natively.

Text-to-videoImage-to-videoReference-to-videoMulti-turn video editing

One integration for generation and editing, with no separate endpoints.

Why Gemini Omni Flash stands out

Grounded in real-world knowledge

An intuitive grasp of physics, plus Gemini's knowledge of history and science as well as culture, for believable scenes.

Location & world awareness

Understands real places and generates regionally accurate settings, down to local signage and language.

Character consistency across scenes

One reference image keeps a character or product consistent across scenes, angles, and lighting.

Character & object swaps

Replace characters or objects just by asking. Motion and dialogue carry over seamlessly.

Motion & style transfer

Apply motion from a source clip to a reference character, or restyle footage without regenerating the scene.

Synced text and audio

On-screen text and sound that connect coherently to what's happening in the frame.

Built for real workflows

Access Gemini Omni Flash, Google's multimodal model that creates and edits video from text, image, video, and audio inputs, grounded in Gemini's real-world knowledge. Iterate through natural conversation, with every edit building on the last while keeping your scene consistent. It's built for real production work, not just isolated clips.

Storyboard to finished video

Upload a storyboard image and get a full video with dialogue and narrative.

Character-led storytelling

Turn a single character image into a multi-scene story with narration and captions.

Background & setting swaps

Drop a subject into a new environment, like a Bollywood dance scene.

Location-aware generation

Generate regionally correct outputs from real places, coordinates, signage, and native language.

Natural-language editing

Change one segment with a follow-up instruction, like swapping Sydney for the Taj Mahal.

Production at scale

Turn a few assets into many production-grade outputs fast.

How it compares

Where Gemini Omni Flash sits next to other leading video models. Its edge is conversational, multi-turn editing on top of generation, grounded in Gemini's real-world knowledge.

Model	Gemini Omni Flash	Veo 3.1	Kling VIDEO O3	Seedance 2.0	HappyHorse 1.0
Input types	Text, image, audio, video	Text, image	Text, image, audio	Text, image, audio, video	Text, image
Native audio	Yes	Yes	Yes	Yes	No
Max resolution	1080p (4K reportedly coming with Pro)	4K	4K	1080p	1080p
Max duration	10s	8s	15s	15s	15s
Aspect ratios	16:9, 9:16	16:9, 9:16	16:9, 9:16, 1:1	16:9, 9:16, 1:1, 4:3, 3:4, 21:9	16:9, 9:16, 1:1, ~4:3, ~3:4
Best fit	Iterative, reference-based editing across mixed inputs	Cinematic generation, high-resolution output	Long-form, character-driven content in 4K	Audio-visual sync, multi-reference composition	High-quality no-audio generation
Capabilities	Conversational multi-turn editing, real-world knowledge & physics, character & scene consistency, generation & editing in one model	Real-world knowledge & physics, character & scene consistency; limited multi-turn editing and unified generation & editing	Character & scene consistency; limited multi-turn editing, real-world knowledge, and unified generation & editing	Character & scene consistency; limited real-world knowledge and unified generation & editing	Limited real-world knowledge and character consistency

Specs and capabilities are indicative and evolve as each model updates.

Frequently asked questions

What is Gemini Omni Flash?

Google's multimodal model that generates and edits video from text, image, video, and audio inputs, grounded in Gemini's real-world knowledge. In Google's own shorthand: “Think of Gemini Omni Flash like Nano Banana, but for video.” It's a conversational, iterative editing workflow, not just a generation tool.

What makes Gemini Omni Flash different?

Conversational, multi-turn editing where each change builds on the last while keeping the scene consistent, plus an intuitive grasp of real-world physics, drawing on Gemini's knowledge of history and science as well as culture.

What workflows does it support?

Text-to-video, image-to-video, reference-to-video, and multi-turn video editing, all through one Runware integration.

How does Gemini Omni Flash compare to other models?

Most video models focus on single-shot generation. Gemini Omni Flash adds conversational, iterative editing on top of generation, so you can refine a scene over multiple turns instead of regenerating from scratch, while keeping characters and continuity consistent. On Runware it sits alongside other leading video models, so you can pick the right one for each task through the same API.

How much does Gemini Omni Flash cost?

Input tokens (text, image, video, and audio) are billed at $1.50 per 1M tokens. Text output costs $9 per 1M tokens. Video output is $0.10 per second of 720p video. On the input side, Google charges 2040 tokens per image and 5792 per video second, with audio at 32 tokens per second. Video output is metered at 5792 tokens per second. Full pricing is listed on the Runware pricing page.

Do I need separate integrations?

No. Generation and editing run through the same API.

Is Gemini Omni Flash available now?

Yes. Gemini Omni Flash is live on Runware. You can try it directly in the Playground using your existing Runware credentials and billing.

Talk to us about volume discounts

Have questions about Gemini Omni Flash? Chat to our team about enterprise usage, including volume discounts and dedicated RPM, and we'll follow up shortly.