Google Veo 3.1

Google Veo 3.1 cinematic AI video with native audio

46fa9eab-4b5c-4d52-ab6e-ea402ca1d732
Commercial use

Each generation will cost $0.2/s for 720p, or $0.4/s for 720p with audio.

720p · 4s (without audio)$0.80
720p · 4s (with audio)$1.60
720p · 8s (without audio)$1.6
720p · 8s (with audio)$3.2
Text To VideoImage To VideoAudio To Video

Google Veo 3.1 is a cinematic video generation model for developers. It turns text prompts or reference images into high fidelity scenes with richer native audio, better prompt adherence, and granular shot control. Use it for story driven clips with smoother motion and consistent style.

Examples

8393f61b-4757-412c-8eb8-e97c3a1937d6
90430df5-7944-470a-910d-8c735ffbd082
ad2f5cb6-0d04-4a36-a832-3e20f124ee4a
76d8ddb0-99e4-4cfe-9ac4-c91761728c22
2cb0e332-46ed-4c47-9e01-cd3cf86c608d
c9c02c69-47a6-45e5-a110-8d0ec566c502
47165a57-128d-47cd-9a90-ba88a0c5bdbb
622e8d40-ca6b-443b-85c2-4b9420275485

README

Overview

Google Veo 3.1 is an advanced AI video generation model that turns natural language descriptions and optional image references into cinematic, story-driven video clips with rich, native audio. It’s built for creators, developers, and storytellers who need high-quality video output without manual animation or rendering.

Veo 3.1 enhances realism, motion coherence, and audiovisual coordination compared to earlier versions, enabling content that feels more immersive and expressive. It supports a range of creative workflows and is designed for rapid prototyping, concept visualisation, and creative storytelling.

How it Works

Veo 3.1 combines several generative techniques to produce cohesive video output from text and images:

Prompt Interpretation

The model parses your natural language prompt to understand subjects, actions, environments, camera movements, and audio cues.

Video Synthesis

A specialised temporal generation pipeline produces sequences of frames that maintain continuity and fluid motion. This ensures smooth transitions and consistent visual composition.

Audio Generation

Native audio tracks — including ambience, music, and sound effects — are generated to align with the visual content, enhancing feel, pacing, and immersion.

Key Features

  • Text-to-Video and Image-to-Video
    Create videos directly from descriptive prompts or use reference images to guide the visual style and composition.
  • Reference Image Support
    Use up to three asset images or a single style image to influence video content, with specific aspect ratio constraints.
  • Frame Anchoring
    Provide first and last frame images to guide motion and narrative direction.
  • Audio Synchronisation
    Generate audio that matches the rhythm and mood of the visuals without separate audio tools.
  • Consistent Motion
    Designed to handle smooth motion and transitions across all frames within the clip.

Technical Specifications

  • Model ID: google:3@2
  • Workflows Supported: Text-to-video, Image-to-video
  • Supported Resolutions: 1280×720, 1920×1080 (standard and vertical where applicable)
  • Frame Rate: 24 FPS
  • Default Duration: 8 seconds
  • Prompt Length: Typically up to 3000 characters
  • Reference Image Constraints: Aspect ratios matching supported video output, up to three asset or one style image, no mixing with frame image guidance
  • Enhanced Prompting: Always enabled to enrich user prompts for quality results

How to Use

  1. Write a clear prompt describing the scene, motion, camera style, and any audio cues.
  2. Choose your input style: text only, reference images, or start and end frame guidance.
  3. Send the request to the API or platform where Veo 3.1 is hosted.
  4. Retrieve the generated video output.

Example prompt:
A lively urban plaza at sunset, slow tracking camera circling around dancers, ambient street sounds with distant music.

Tips for Better Results

  • Start with the main subject, then layer in environment and motion.
  • Add mood, lighting, and audio cues towards the end of the prompt.
  • Use reference or frame images for tighter control over composition and motion.

Notes & Limitations

  • Veo 3.1 is optimised for short, high-quality video clips.
  • Longer narratives may require multiple generations or stitched outputs.
  • Image and aspect ratio constraints apply when using reference inputs.

Documentation

You can find full usage details, parameters, and examples here: https://runware.ai/docs/en/providers/google#veo-31